DBMS

Database Management systems
1. Introduction DBMS A Database is a collection of interrelated data and a Database Management System is a set of programs to use and/or modify this data. 1. 1 Approaches to Data Management File-Based Systems Conventionally, before the Database systems evolved, data in software systems was stored in and represented using flat files. Database Systems Database Systems evolved in the late 1960s to address common issues in applications handling large volumes of data which are also data intensive. Some of these issues could be traced back to the following disadvantages of File-based systems. Drawbacks of File-Based Systems
As shown in the figure, in a file-based system, different programs in the same application may be interacting with different private data files. There is no system enforcing any standardized control on the organization and structure of these data files. Data Redundancy and Inconsistency Since data resides in different private data files, there are chances of redundancy and resulting inconsistency. For example, in the above example shown, the same customer can have a savings account as well as a mortgage loan. Here the customer details may be duplicated since the programs for the two functions store their corresponding data in two different data files. This gives rise to redundancy in the customer's data. Since the same data is stored in two files, inconsistency arises if a change made in the data in one file is not reflected in the other. Unanticipated Queries In a file-based system, handling sudden/ad-hoc queries can be difficult, since it requires changes in the existing programs. Data Isolation Though data used by different programs in the application may be related, they reside in isolated data files. Concurrent Access Anomalies In large multi-user systems the same file or record may need to be accessed by multiple users simultaneously. Handling this in a file-based systems is difficult. Security Problems In data-intensive applications, security of data is a major concern. Users should be given access only to required data and not the whole database. In a file-based system, this can be handled only by additional programming in each application. Integrity Problems In any application, there will be certain data integrity rules which needs to be maintained. These could be in the form of certain conditions/constraints on the elements of the data records. In the savings bank application, one such integrity rule could be Customer ID, which is the unique identifier for a customer record, should be non-empty. There can be several such integrity rules. In a file-based system, all these rules need to be explicitly programmed in the application program. It may be noted that, we are not trying to say that handling the above issues like concurrent access, security, integrity problems, etc., is not possible in a file-based system. The real issue was that, though all these are common issues of concern to any dataintensive application, each application had to handle all these problems on its own. The application programmer needs to bother not only about implementing the application business rules but also about handling these common issues.
1.2 Advantages of Database Systems
As shown in the figure, the DBMS is a central system which provides a common interface between the data and the various front-end programs in the application. It also provides a central location for the whole data in the application to reside. Due to its centralized nature, the database system can overcome the disadvantages of the file-based system as discussed below. Minimal Data Redundancy Since the whole data resides in one central database, the various programs in the application can access data in different data files. Hence data present in one file need not be duplicated in another. This reduces data redundancy. However, this does not mean all redundancy can be eliminated. There could be business or technical reasons for having some amount of redundancy. Any such redundancy should be carefully controlled and the DBMS should be aware of it. Data Consistency Reduced data redundancy leads to better data consistency. Data Integration
Since related data is stored in one single database, enforcing data integrity is much easier. Moreover, the functions in the DBMS can be used to enforce the integrity rules with minimum programming in the application programs. Data Sharing
Related data can be shared across programs since the data is stored in a centralized manner. Even new applications can be developed to operate against the same data. Enforcement of Standards Enforcing standards in the organization and structure of data files is required and also easy in a Database System, since it is one single set of programs which is always interacting with the data files. Application Development Ease The application programmer need not build the functions for handling issues like concurrent access, security, data integrity, etc. The programmer only needs to implement the application business rules. This brings in application development ease. Adding additional functional modules is also easier than in file-based systems. Better Controls Better controls can be achieved due to the centralized nature of the system. Data Independence The architecture of the DBMS can be viewed as a 3-level system comprising the following: - The internal or the physical level where the data resides. - The conceptual level which is the level of the DBMS functions - The external level which is the level of the application programs or the end user. Data Independence is isolating an upper level from the changes in the organization or structure of a lower level. For example, if changes in the file organization of a data file do not demand for changes in the functions in the DBMS or in the application programs, data independence is achieved. Thus Data Independence can be defined as immunity of applications to change in physical representation and access technique. The provision of data independence is a major objective for database systems. Reduced Maintenance Maintenance is less and easy, again, due to the centralized nature of the system. 1.3 Functions of a DBMS The functions performed by a typical DBMS are the following: Data Definition
The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints/conditions to be satisfied by the data in each field. Data Manipulation Once the data structure is defined, data needs to be inserted, modified or deleted. The functions which perform these operations are also part of the DBMS. These function can handle planned and unplanned data manipulation needs. Planned queries are those which form part of the application. Unplanned queries are ad-hoc queries which are performed on a need basis. Data Security & Integrity The DBMS contains functions which handle the security and integrity of data in the application. These can be easily invoked by the application and hence the application programmer need not code these functions in his/her programs. Data Recovery & Concurrency Recovery of data after a system failure and concurrent access of records by multiple users are also handled by the DBMS. Data Dictionary Maintenance Maintaining the Data Dictionary which contains the data definition of the application is also one of the functions of a DBMS. Performance Optimizing the performance of the queries is one of the important functions of a DBMS. Hence the DBMS has a set of programs forming the Query Optimizer which evaluates the different implementations of a query and chooses the best among them. Thus the DBMS provides an environment that is both convenient and efficient to use when there is a large volume of data and many transactions to be processed. 1.4 Role of the Database Administrator Typically there are three types of users for a DBMS. They are : 1. The End User who uses the application. Ultimately, this is the user who actually puts the data in the system into use in business. This user need not know anything about the organization of data in the physical level. She also need not be aware of the complete data in the system. She needs to have access and knowledge of only the data she is using. 2. The Application Programmer who develops the application programs. She has more knowledge about the data and its structure since she has manipulate the data using
her programs. She also need not have access and knowledge of the complete data in the system. 3. The Database Administrator (DBA) who is like the super-user of the system. The role of the DBA is very important and is defined by the following functions. Defining the Schema The DBA defines the schema which contains the structure of the data in the application. The DBA determines what data needs to be present in the system ad how this data has to be represented and organized. Liaising with Users The DBA needs to interact continuously with the users to understand the data in the system and its use. Defining Security & Integrity Checks The DBA finds about the access restrictions to be defined and defines security checks accordingly. Data Integrity checks are also defined by the DBA. Defining Backup / Recovery Procedures The DBA also defines procedures for backup and recovery. Defining backup procedures includes specifying what data is to backed up, the periodicity of taking backups and also the medium and storage place for the backup data. Monitoring Performance The DBA has to continuously monitor the performance of the queries and take measures to optimize all the queries in the application. 1.5 Types of Database Systems Database Systems can be catagorised according to the data structures and operators they present to the user. The oldest systems fall into inverted list, hierarchic and network systems. These are the pre-relational models. In the Hierarchical Model, different records are inter-related through hierarchical or tree-like structures. A parent record can have several children, but a child can have only one parent. In the figure, there are two hierarchies shown - the first storing the relations between CUSTOMER, ORDERS, CONTACTS and ORDER_PARTS and the second showing the relation between PARTS, ORDER_PARTS and SALES_HISTORY. The many-to-many relationship is implemented through the ORDER_PARTS segment which occurs in both the hierarchies. In practice, only one tree stores the ORDER_PARTS segment, while the other has a logical pointer to this segment. IMS (Information Management System) of IBM is an example of a Hierarchical DBMS.
In the Network Model, a parent can have several children and a child can also have many parent records. Records are physically linked through linked-lists. IDMS from Computer Associates International Inc. is an example of a Network DBMS.
In the Relational Model, unlike the Hierarchical and Network models, there are no physical links. All data is maintained in the form of tables consisting of rows and columns. Data in two tables is related through common columns and not physical links or pointers. Operators are provided for operating on rows in tables. Unlike the other two type of DBMS, there is no need to traverse pointers in the Relational DBMS. This makes querying much more easier in a Relational DBMS than in the the Hierarchical or Network DBMS. This, in fact, is a major reason for the relational model to become more programmer friendly and much more dominant and popular in both industrial and academic scenarios. Oracle, Sybase, DB2, Ingres, Informix, MS-SQL Server are few of the popular Relational DBMSs. CUSTOMER CUST. NO. CUSTOMER NAME 15371 Nanubhai & Sons ... ... ... ... ADDRESS L. J. Road ... ... CITY Mumbai ... ...
... CONTACTS CUST.NO. CONTACT 15371 15371 ... ... PARTS PARTS NO. S3 ... ... ...
...
...
...
DESIGNATION Owner Accountant ... ...
Nanubhai Rajesh Munim ... ...
ORDERS ORDER NO. 3216 ... ... ...
CUSTOMER NO. 24-June-1997 15371 ... ... ... ... ... ... ORDER DATE
PARTS DESC Amkette 3.5" Floppies ... ... ...
PART PRICE 400.00 ... ... ...
ORDERS-PARTS ORDER PART NO. NO. 3216 3216 ... ... C1 S3 ... ...
QUANTITY 300 120 ... ...
SALES-HISTORY PART NO. S3 S3 S3 S3 REGION East North South West YEAR 1996 1996 1996 1996 UNITS 2000 5500 12000 20000
The recent developments in the area have shown up in the form of certain object and object/relational DBMS products. Examples of such systems are GemStone and Versant ODBMS. Research has also proceeded on to a variety of other schemes including the multi-dimensional approach and the logic-based approach. 3-Level Database System Architecture
The External Level represents the collection of views available to different endusers. The Conceptual level is the representation of the entre information content of the database. The Internal level is the physical level which shows how the data data is stored, what are the representation of the fields etc.
2. The Internal Level This chapter discusses the issues related to how the data is physically stored on the disk and some of the access mechanisms commonly used for retrieving this data. The Internal Level is the level which deals with the physical storage of data. While designing this layer, the main objective is to optimize performance by minimizing the number of disk accesses during the various database operations.
The figure shows the process of database access in general. The DBMS views the database as a collection of records. The File Manager of the underlying Operating System views it as a set of pages and the Disk Manager views it as a collection of physical locations on the disk. When the DBMS makes a request for a specific record to the File Manager, the latter maps the record to a page containing it and requests the Disk Manager for the specific page. The Disk Manager determines the physical location on the disk and retrieves the required page. 2.1 Clustering In the above process, if the page containing the requested record is already in the memory, retrieval from the disk is not necessary. In such a situation, time taken for the whole operation will be less. Thus, if records which are frequently used together are placed physically together, more records will be in the same page. Hence the number of pages to be retrieved will be less and this reduces the number of disk accesses which in turn gives a better performance. This method of storing logically related records, physically together is called clustering. Eg: Consider CUSTOMER table as shown below.
Cust ID 10001 10002 10003 10004 ... ...
Cust Name Raj ... ... ... ... ...
Cust City Delhi ... ... ... ... ...
... ... ... ... ... ... ...
If queries retrieving Customers with consecutive Cust_IDs frequently occur in the application, clustering based on Cust_ID will help improving the performance of these queries. This can be explained as follows. Assume that the Customer record size is 128 bytes and the typical size of a page retrieved by the File Manager is 1 Kb (1024 bytes). If there is no clustering, it can be assumed that the Customer records are stored at random physical locations. In the worst-case scenario, each record may be placed in a different page. Hence a query to retrieve 100 records with consecutive Cust_Ids (say, 10001 to 10100), will require 100 pages to be accessed which in turn translates to 100 disk accesses. But, if the records are clustered, a page can contain 8 records. Hence the number of pages to be accessed for retrieving the 100 consecutive records will be ceil(100/8) = 13. i.e., only 13 disk accesses will be required to obtain the query results. Thus, in the given example, clustering improves the speed by a factor of 7.7 Q: For what record size will clustering be of no benefit to improve performance ? A: When the record size and page size are such that a page can contain only one record. Q: Can a table have clustering on multiple fields simultaneously ? A: No Intra-file Clustering Clustered records belong to the same file (table) as in the above example. Inter-file Clustering Clustered records belong to different files (tables). This type of clustering may be required to enhance the speed of queries retrieving related records from more than one tables. Here interleaving of records is used. 2.2 Indexing Indexing is another common method for making retrievals faster.
Consider the example of CUSTOMER table used above. The following query is based on Customer's city. Retrieve the records of all customers who reside in Delhi Here a sequential search on the CUSTOMER table has to be carried out and all records with the value 'Delhi' in the Cust_City field have to be retrieved. The time taken for this operation depends on the number of pages to be accessed. If the records are randomly stored, the page accesses depends on the volume of data. If the records are stored physically together, the number of pages depends on the size of each record also. If such queries based on Cust_City field are very frequent in the application, steps can be taken to improve the performance of these queries. Creating an Index on Cust_City is one such method. This results in the scenario as shown below.
A new index file is created. The number of records in the index file is same as that of the data file. The index file has two fields in each record. One field contains the value of the Cust_City field and the second contains a pointer to the actual data record in the CUSTOMER table. Whenever a query based on Cust_City field occurs, a search is carried out on the Index file. Here, it is to be noted that this search will be much faster than a sequential search in the CUSTOMER table, if the records are stored physically together. This is because of the much smaller size of the index record due to which each page will be able to contain more number of records. When the records with value 'Delhi' in the Cust_City field in the index file are located, the pointer in the second field of the records can be followed to directly retrieve the corresponding CUSTOMER records. Thus the access involves a Sequential access on the index file and a Direct access on the actual data file. Retrieval Speed v/s Update Speed : Though indexes help making retrievals faster, they slow down updates on the table since updates on the base table demand update on the index field as well.
It is possible to create an index with multiple fields i.e., index on field combinations. Multiple indexes can also be created on the same table simultaneously though there may be a limit on the maximum number of indexes that can be created on a table.
Q: In which of the following situations will indexes be ineffective ? a) When the percentage of rows being retrieved is large b) When the data table is small and the index record is of almost the same size as of the actual data record. c) In queries involving NULL / Not NULL in the indexed field. d)All of the above A: d) All of the above Q: Can a clustering based on one field and indexing on another field exist on the same table simultaneously ? A: Yes 2.3 Hashing Hashing is yet another method used for making retrievals faster. This method provides direct access to record on the basis of the value of a specific field called the hash_field. Here, when a new record is inserted, it is physically stored at an address which is computed by applying a mathematical function (hash function) to the value of the hash field. Thus for every new record, hash_address = f (hash_field), where f is the hash function.
Later, when a record is to be retrieved, the same hash function is used to compute the address where the record is stored. Retrievals are faster since a direct access is provided and there is no search involved in the process. An example of a typical hash function is given by a numeric hash field, say an id, modulus a very large prime number. Q: Can there be more than one hash fields on a file ? A: No As hashing relates the field value to the address of the record, multiple hash fields will map a record to multiple addresses at the same time. Hence there can be only one hash field per file. Collisions : Consider the example of the CUSTOMER table given earlier while discussing clustering. Let CUST_ID be the hash field and the hash function be defined as ((CUST_ID mod 10000)*64 + 1025). The records with CUST_ID 10001, 10002, 10003 etc. will be stored at addresses 1089, 1153, 1217 etc. respectively. It is possible that two records hash to the same address leading to a collision. In the above example, records with CUST_ID values 20001, 20002, 20003 etc. will also map on to the addresses 1089, 1153, 1217 etc. respectively. And same is the case with CUST_ID values 30001, 30002, 30003 etc. The methods to resolve a collision are by using : 1. Linear Search: While inserting a new record, if it is found that the location at the hash address is already occupied by a previously inserted record, search for the next free location available in the disk and store the new record at this location. A pointer from the first record at the original hash address to the new record will also be stored. During retrieval, the hash address is computed to locate the record. When it is seen that the record is not available at the hash address, the pointer from the record at that address is followed to locate the required record.
In this method, the over head incurred is the time taken for the linear search to locate the next free location while inserting a record. 2. Collision Chain: Here, the hash address location contains the head of a list of pointers linking together all records which hash to that address.
In this method, an overflow area needs to be used if the number of records mapping on to the same hash address exceeds the number of locations linked to it. 3.1 The Relational Model Relational Databases: Terminology
Ord_Items Databases: Case Example Ord_Aug Ord # 101 102 103 104 105 OrdDate 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 002 003 003 002 005 Ord # 101 101 101 102 103 104 104 105 Items Item # HW1 HW2 HW3 SW1 SW2 Descr Power Supply 101-Keyboard Mouse MS-DOS 6.0 MS-Word 6.0 Price 4000 2000 800 5000 8000 Item # HW1 HW3 SW1 HW2 HW3 HW2 HW3 SW1 Qty 100 50 150 10 50 25 100 100
Customers Ord # 101 102 103 104 105 OrdDate 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 002 003 003 002 005
Term Relation Tuple Attribute Cardinality of a relation Degree of a relation A table
Meaning
Eg. from the given Case Example Ord_Aug, Customers, Items etc. A row from Customers relation is a Customer tuple. Ord_Date, Item#, CustName etc. Cardinality of Ord_Items relation is 8 Degree of Customers relation is 3.
A row or a record in a relation. A field or a column in a relation. The number of tuples in a relation. The number of attributes in a relation.
Domain of an attribute
The set of all values that can be taken by the attribute. An attribute or a combination of attributes that uniquely defines each tuple in a relation. An attribute or a combination of attributes in one relation R1 which indicates the relationship of R1 with another relation R2. The foreign key attributes in R1 must contain values matching with those of the values in R2
Domain of Qty in Ord_Items is the set of all values which can represent quantity of an ordered item. Primary Key of Customers relation is Cust#. Ord# and Item# combination forms the primary Key of Ord_Items Cust# in Ord_Aug relation is a foreign key creating reference from Ord_Aug to Customers. This is required to indicate the relationship between Orders in Ord_Aug and Customers. Ord# and Item# in Ord_Items are foreign keys creating references from Ord_Items to Ord_Aug and Items respectively.
Primary Key of a relation
Foreign Key
3.2 Properties of Relations No Duplicate Tuples A relation cannot contain two or more tuples which have the same values for all the attributes. i.e., In any relation, every row is unique. Tuples are unordered The order of rows in a relation is immaterial. Attributes are unordered The order of columns in a relation is immaterial. Attribute Values are Atomic Each tuple contains exactly one value for each attribute. It may be noted that many of the properties of relations follow the fact that the body of a relation is a mathematical set. 3.3 Integrity Rules The following are the integrity rules to be satisfied by any relation. No Component of the Primary Key can be null. The Database must not contain any unmatched Foreign Key values. This is called the referential integrity rule. Q: Can the Foreign Key accept nulls? A: Yes, if the application business rule allows this. How do we explain this ? Unlike the case of Primary Keys, there is no integrity rule saying that no component of the foreign key can be null. This can be logically explained with the help of the following example: Consider the relations Employee and Account as given below.
Employee Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Account ACC# 120001 120002 120003 120004 OpenDate 30-Aug-1998 29-Oct-1998 01-Jan-1999 04-Mar-1999 BalAmt 5000 1200 3000 500 EmpAcc# 120001 120002 Null 120003
EmpAcc# in Employee relation is a foreign key creating reference from Employee to Account. Here, a Null value in EmpAcc# attribute is logically possible if an Employee does not have a bank account. If the business rules allow an employee to exist in the system without opening an account, a Null value can be allowed for EmpAcc# in Employee relation. In the case example given, Cust# in Ord_Aug cannot accept Null if the business rule insists that the Customer No. needs to be stored for every order placed. The next issue related to foreign key reference is handling deletes / updates of parent? In the case example, can we delete the record with Cust# value 002, 003 or 005 ? The default answer is NO, as long as there is a foreign key reference to these records from some other table. Here, the records are referenced from the order records in Ord_Aug relation. Hence Restrict the deletion of the parent record. Deletion can still be carried if we use the Cascade or Nullify strategies. Cascade: Delete/Update all the references successively or in a cascaded fashion and finally delete/update the parent record. In the case example, Customer record with Cust#002 can be deleted after deleting order records with Ord# 101 and 104. But these order records, in turn, can be deleted only after deleting those records with Ord# 101 and 104 from Ord_Items relation.
Nullify: Update the referencing to Null and then delete/update the parent record. In the above example of Employee and Account relations, an account record may have to be deleted if the account is to be closed. For example, if Employee Raj decides to close his account, Account record with Acc# 120002 has to be deleted. But this deletion is not possible as long as the Employee record of Raj references it. Hence the strategy can be to update the EmpAcc# field in the employee record of Raj to Null and then delete the Account parent record of 120002. After the deletion the data in the tables will be as follows: Employee Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Account ACC# 120001 120002 120003 120004 3.4 Relational Algebra Operators The eight relational algebra operators are 1. SELECT To retrieve specific tuples/rows from a relation. OpenDate 30-Aug-1998 29-Oct-1998 01-Jan-1999 04-Mar-1999 BalAmt 5000 1200 3000 500 EmpAcc# 120001 120002 Null Null 120003
Ord# 101 104
OrdDate 02-08-94 18-09-94
Cust# 002 002
2. PROJECT To retrieve specific attributes/columns from a relation.
Descr Power Supply 101-Keyboard Mouse MS-DOS 6.0 MS-Word 6.0
Price 4000 2000 800 5000 8000
3. PRODUCT To obtain all possible combination of tuples from two relations.
Ord# 101 101 101 101 101 102 102
OrdDate 02-08-94 02-08-94 02-08-94 02-08-94 02-08-94 11-08-94 11-08-94
O.Cust# 002 002 002 002 002 003 003
C.Cust# 001 002 003 004 005 001 002
CustName Shah Srinivasan Gupta Banerjee Apte Shah Srinivasan
City Bombay Madras Delhi Calcutta Bombay Bombay Madras
4. UNION To retrieve tuples appearing in either or both the relations participating in the UNION.
Eg: Consider the relation Ord_Jul as follows (Table: Ord_Jul) Ord# 101 102 101 102 103 104 105 OrdDate 03-07-94 27-07-94 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 001 003 002 003 003 002 005
Note: The union operation shown above logically implies retrieval of records of Orders placed in July or in August 5. INTERSECT- To retrieve tuples appearing in both the relations participating in the INTERSECT.
Eg: To retrieve Cust# of Customers who've placed orders in July and in August Cust# 003
6. DIFFERENCE To retrieve tuples appearing in the first relation participating in the DIFFERENCE but not the second.
Eg: To retrieve Cust# of Customers who've placed orders in July but not in August Cust# 001
7. JOIN To retrieve combinations of tuples in two relations based on a common field in both the relations.
Eg: ORD_AUG join CUSTOMERS (here, the common column is Cust#) Ord# 101 OrdDate 02-08-94 Cust# 002 CustNames Srinivasan City Madras
102 103 104 105
11-08-94 21-08-94 28-08-94 30-08-94
003 003 002 005
Gupta Gupta Srinivasan Apte
Delhi Delhi Madras Bombay
Note: The above join operation logically implies retrieval of details of all orders and the details of the corresponding customers who placed the orders. Such a join operation where only those rows having corresponding rows in the both the relations are retrieved is called the natural join or inner join. This is the most common join operation. Consider the example of EMPLOYEE and ACCOUNT relations. EMPLOYEE EMP # X101 X102 X103 X104 ACCOUNT Acc# 120001 120002 120003 120004 OpenDate 30. Aug. 1998 29. Oct. 1998 1. Jan. 1999 4. Mar. 1999 BalAmt 5000 1200 3000 500 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 Null 120003
A join can be formed between the two relations based on the common column Acc#. The result of the (inner) join is : Emp# X101 EmpName Shekhar EmpCity Bombay Acc# 120001 OpenDate 30. Aug. 1998 BalAmt 5000
X102 X104
Raj Vani
Pune Bhopal
120002 120003
29. Oct. 1998 1. Jan 1999
1200 3000
Note that, from each table, only those records which have corresponding records in the other table appear in the result set. This means that result of the inner join shows the details of those employees who hold an account along with the account details. The other type of join is the outer join which has three variations the left outer join, the right outer join and the full outer join. These three joins are explained as follows: The left outer join retrieves all rows from the left-side (of the join operator) table. If there are corresponding or related rows in the right-side table, the correspondence will be shown. Otherwise, columns of the right-side table will take null values.
EMPLOYEE left outer join ACCOUNT gives: Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 NULL 120003 OpenDate 30. Aug. 1998 29. Oct. 1998 NULL 1. Jan 1999 BalAmt 5000 1200 NULL 3000
The right outer join retrieves all rows from the right-side (of the join operator) table. If there are corresponding or related rows in the left-side table, the correspondence will be shown. Otherwise, columns of the left-side table will take null values.
EMPLOYEE right outer join ACCOUNT gives: Emp# X101 X102 X104 NULL EmpName Shekhar Raj Vani NULL EmpCity Bombay Pune Bhopal NULL Acc# 120001 120002 120003 120004 OpenDate 30. Aug. 1998 29. Oct. 1998 1. Jan 1999 4. Mar. 1999 BalAmt 5000 1200 3000 500
(Assume that Acc# 120004 belongs to someone who is not an employee and hence the details of the Account holder are not available here)
The full outer join retrieves all rows from both the tables. If there is a correspondence or relation between rows from the tables of either side, the correspondence will be shown. Otherwise, related columns will take null values.
EMPLOYEE full outer join ACCOUNT gives: Emp# X101 X102 EmpName Shekhar Raj EmpCity Bombay Pune Acc# 120001 120002 OpenDate 30. Aug. 1998 29. Oct. 1998 BalAmt 5000 1200
X103 X104 NULL
Sharma Vani NULL
Nagpur Bhopal NULL
NULL 120003 120004
NULL 1. Jan 1999 4. Mar. 1999
NULL 3000 500
Q: What will the result of a natural join operation between R1 and R2 ? A: a1 a2 a3 b1 b2 b3 c1 c2 c3
8. DIVIDE Consider the following three relations:
R1 divide by R2 per R3 gives: a Thus the result contains those values from R1 whose corresponding R2 values in R3 include all R2 values. 4. Structured Query Language (SQL) 4.1 SQL : An Overview The components of SQL are a. Data Manipulation Language Consists of SQL statements for operating on the data (Inserting, Modifying, Deleting and Retrieving Data) in tables which already exist.
b. Data Definition Language Consists of SQL statements for defining the schema (Creating, Modifying and Dropping tables, indexes, views etc.) c. Data Control Language Consists of SQL statements for providing and revoking access permissions to users Tables used:
Ord_Aug Ord# 101 102 103 104 105 Items Item# HW1 HW2 HW3 SW1 SW2 Descr Power Supply 101- Keyboard Mouse MS-DOS 6.0 MS-Word 6.0 Price 4000 2000 800 5000 8000 OrdDate 02-AUG-94 11-AUG-94 21-AUG-94 28-AUG-94 30-AUG-94 Cust# 002 003 003 002 005
Ord_Items Ord# 101 101 101 102 103 104 104 105 Item# HW1 HW3 SW1 HW2 HW3 HW2 HW3 SW1 Qty 100 50 150 10 50 25 100 100
Customers Cust# 001 002 003 004 005 CustName Shah Srinivasan Gupta Banerjee Apte City Bombay Madras Delhi Calcutta Bombay
4.2 DML SELECT, INSERT, UPDATE and DELETE statements. The SELECT statement
Retrieves rows from one or more tables according to given conditions. General form: SELECT [ ALL | DISTINCT ] <attribute (comma)list> FROM <table (comma)list> [ WHERE <conditional expression>] [ ORDER BY [DESC] <attribute list> [ GROUP BY <attribute (comma)list>] [ HAVING <conditional expression>]
Query 1: Some SELECT statements on the Case Example SELECT * <----------------FROM items; Result Query 2: SELECT cust#,custname FROM customers; Result Query 3: SELECT DISTINCT item# FROM ord_items; Result Query 4: SELECT ord# "Order ", orddate "Ordered On" <--FROM ord_aug; In the result set the column headings will appear as Order and Ordered On instead of ord# and orddate. * -denotes all attributes in the table
Result Query 5: SELECT item#, descr FROM items WHERE price>2000; Result Query 6: SELECT custname FROM customers WHERE city<>'Bombay'; Result Query 7: SELECT custname FROM customers WHERE UPPER(city)<>'BOMBAY'; Result Query 8: SELECT * FROM ord_aug WHERE orddate > '15-AUG-94'; <---------Result Query 9: SELECT * FROM ord_items Illustrates the use of 'date' fields. In SQL, a separate datatype (eg: date, datetime etc.) is available to store data which is of type date.
WHERE qty BETWEEN 100 AND 200; Result Query 10: SELECT custname FROM customers WHERE city IN ('Bombay', 'Madras'); <-----The conditional expression evaluates to TRUE for those records for which the value of city field is in the list ('Bombay, 'Madras')
Result Query 11:
SELECT custname FROM customers WHERE custname LIKE 'S%' ; <-----------LIKE 'S%' - 'S' followed by zero or more characters
Result Query 12: SELECT * FROM ord_items WHERE qty>100 AND item# LIKE 'SW%'; Result Query 13: SELECT custname FROM customers WHERE city='Bombay' OR city='Madras'; Result
Query 14: SELECT * FROM customers WHERE city='Bombay' ORDER BY custname; <--------------------
Records in the result set is displayed in the ascending order of custname
Result Query 15:
SELECT * FROM ord_items ORDER BY item#, qty DESC; <------------Display the result set in the ascending order of item#. If there are more than one records with the same item# , they will be displayed in the descending order of qty
Result Query 16: SELECT descr, price
ORDER BY 2 FROM items ORDER BY 2; <---------------------------ORDER BY the 2nd attribute (price) in the attribute list of the SELECT clause Result Query 17:
SELECT ord#, ord_aug.cust#, custname <---------------FROM ord_aug, customers WHERE city='Delhi' AND ord_aug.cust# = customers.cust#; <----------------
SELECT statement implementing JOIN operation.
JOIN condition
Result Query 18: SELECT ord#, customers.cust#, city FROM ord_aug, customers WHERE ord_aug.cust# = customers.cust#; Result Query 19: SELECT ord#, customers.cust#, city FROM ord_aug, customers WHERE ord_aug.cust# = customers.cust# (+); <----------Result Nested SELECT statements (+) indicates outer join. Here it is a right outer join as indicated by the (+) after the right side field.
SQL allows nesting of SELECT statements. In a nested SELECT statement the inner SELECT is evaluated first and is replaced by its result to evaluate the outer SELECT statement. Query 20: SELECT item#, descr, price <--------------------------FROM items WHERE price > (SELECT AVG(price) FROM items); <-----Result Query 21: SELECT cust#, custname <-----------------FROM customers WHERE city = ( SELECT city FROM customers WHERE custname='Shah'); Result Arithmetic Expressions + * / () Arithmetic functions are allowed in SELECT and WHERE clauses. Query 22: SELECT descr, price, price*0.1 "discount" FROM items WHERE price >= 4000 Here the outer SELECT is evaluated as SELECT cust#, custname FROM customers WHERE city = "BOMBAY" Inner SELECT statement Outer SELECT statement
ORDER BY 3; Result Query 23: SELECT descr FROM items, ord_items WHERE price*qty > 250000 and items.item# = ord_items.item#; Result Numeric Functions Query 24: SELECT qty, ROUND(qty/2,0) "qty supplied" FROM ord_items WHERE item#='HW2'; Result Query 25: SELECT qty, TRUNC(qty/2,0) "qty supplied" FROM ord_items WHERE item#='HW2'; Result Examples of Numeric Functions
MOD(n,m) SQRT(n) ROUND(n,m) TRUNC(n,m)
'm' indicates the number of digits after decimal points in the result. Date Arithemetic
Date + No. of days Date - No. of days Date Date
Query 26: SELECT ord#, orddate+15 "Supply by" FROM ord_aug; Result Date Functions MONTHS_BETWEEN(date1, date2) ADD_MONTHS(date, no. of months) SYSDATE Returns system date. Query 27: SELECT ord#, MONTHS_BETWEEN(SYSDATE,orddate) FROM ord_aug; Result Query 28: SELECT TO_CHAR(orddate,' DD/MM/YYYY') <-FROM ord_aug;
Converts the value of the date field orddate to character string of the format DD/MM/YYYY
Result
Note: DD - day of month (1-31) D - day of week (1-7) DAY - name of day MM - month (01-12) MONTH - name of month MON - abbreviated name of month HH:MI:SS - hours:minutes:seconds fm - fill mode : suppress blank padding
Character Expressions & Functions || - Concatenate operator
Query 29: SELECT custname || ' - ' || city FROM customers; Result Examples of Character Functions: INITCAP(string) UPPER(string) LOWER(string) SUBSTR(string,start,no. of characters) Group Functions Group functions are functions which act on the entire column of selected rows.
Query 30: SELECT SUM(qty), AVG(qty) <--------------FROM ord_items WHERE item#='SW1'; Result Examples of Group Functions: SUM AVG COUNT MAX MIN Query 31: SELECT item#, SUM(qty) FROM ord_items GROUP BY item#; <------------------------Result Query 32: SELECT item#, SUM(qty) FROM ord_items GROUP BY item# HAVING SUM(qty)>100; <-----------------Result Query 33: HAVING clause used to apply the condition to be applied on the grouped rows and display the final result. GROUP BY clause used to group rows according to the value of item# in the result. SUM function acts individually on each group of rows.
SUM and AVG are examples of Group Functions. They compute the sum/average of qty values of all rows where item#='SW1'.
SELECT item#, SUM(qty) FROM ord_items GROUP BY item# HAVING COUNT(*)>2; Result
The INSERT statement Inserts one or more tuples in a table. General forms: To insert a single tuple INSERT INTO <table-name> [<attribute (comma)list>] VALUES <value list>; To insert multiple tuples INSERT INTO <table-name> [<attribute (comma)list>] SELECT [ ALL | DISTINCT ] <attribute (comma)list> FROM <table (comma)list>* [ WHERE <conditional expression>]; * - list of existing tables Sample INSERT statements from the Case Example Query 34: Insert all values for a new row INSERT INTO customers <------------------VALUES (006, 'Krishnan', 'Madras'); Inserts a single row in Customers Table. Attribute list need not be mentioned if values are given for all attributes in the tuple.
Query 35: Insert values of item# & descr columns for a new row
INSERT INTO items (item#, descr) <---------VALUES ('HW4', '132-DMPrinter');
Attribute list mentioned since values are not given for all attributes in the tuple. Here Price column for the newly inserted tuple takes NULL value.
Query 36: Inserts a new row which includes a date field INSERT INTO ord_aug VALUES(106, '31-AUG-94', 005); Query 37: Inserts a new row with the date field being specified in non DD-MON-YY format INSERT INTO ord_aug VALUES (106, TO_DATE('310894','DDMMYY'), 005); The UPDATE statement Updates values of one or more attributes of one or more tuples in a table. General form: UPDATE <table-name> SET <attribute-1 = value-1[, attribute-2 = value-2,...attribute-n = value-n] [ WHERE <conditional expression>]; Sample UPDATE statements from the Case Example Query 38: changes price of itmem SW1 to 6000 UPDATE items SET price = 6000 WHERE item# ='SW1'; Query 39: Changes a wrongly entered item# from HW2 to SW2 UPDATE ord_items SET item# = 'SW2' WHERE ord#=104 AND item# = 'HW2'; The DELETE statement
Deletes one or more tuples in a table according to given conditions General form: DELETE FROM <table-name> [ WHERE <conditional expression>]; Sample DELETE statements from the Case Example Query 40: Deletes Customer record with Customer Number 004 DELETE FROM customers WHERE cust# = 004; DELETE FROM Ord_Items; <------------------Deletes all rows in Ord_Items Table. The table remains empty after the DELETE operation.
4.3 DDL CREATE, ALTER, and DROP statements. DDL statements are those which are used to create, modify and drop the definitions or structures of various tables, views, indexes and other elements of the DBMS. The CREATE TABLE statement Creates a new table. General form: CREATE TABLE <table-name> (<table-element (comma)list>*); * - table element may be attribute with its data-type and size or any integrity constraint on attributes. Some CREATE TABLE statements on the Case Example Query: CREATE TABLE customers ( cust# NUMBER(6) NOT NULL, custname CHAR(30) ,
city CHAR(20)); - This query Creates a table CUSTOMERS with 3 fields - cust#, custname and city. Cust# cannot be null Query: CREATE TABLE ord_sep <------------------AS SELECT * from ord_aug; Creates a new table ord_sep, which has the same structure of ord_aug. The data in ord_aug is copied to the new table ord_sep.
- This query Creates table ORD_SEP as a cpy of ORD-AUG. Copies structure as well as data. Query: CREATE TABLE ord_sep <-----------------AS SELECT * from ord_aug WHERE 1 = 2;
Creates a new table ord_sep, which has the same structure of ord_aug. No data in ord_aug is copied to the new table since there is no row which satisfies the 'always false' condition 1 = 2.
- This query Creates table ORD_SEP as a copy of ORD_AUG, but does not copy any data as the WHERE clause is never satisfied. The ALTER TABLE statement Alters the structure of an existing table. General form: ALTER TABLE <table-name> ADD | MODIFY (<table-element (comma)list); Examples of ALTER TABLE statement. Query:
ALTER TABLE customers MODIFY custname CHAR(35); <------------Modifies the data type/size of an attribute in the table
- This query changes the custname field to a character field of length 35. Used for modifying field lengths and attributes.
Query: ALTER TABLE customers ADD (phone number(8), <-----------------credit_rating char(1));
Adds two new attributes to the Customers table. Here, for existing tuples (if any), the new attribute will take NULL values since no DEFAULT value is mentioned for the attribute.
- This query adds two new fields - phone & credit_rating to the customers table.
The DROP TABLE statement DROPS an existing table. General form: DROP TABLE <table-name>; Example: Query: DROP TABLE ord_sep; - The above query drops table ORD_SEP from the database Creating & Dropping Views A view is a virtual relation created with attributes from one or more base tables. SELECT * FROM myview1; at any given time will evaluate the view-defining query in the CREATE VIEW statement and display the result. Query: CREATE VIEW myview1 AS SELECT ord#, orddate, ord_aug.cust#, custname FROM ord_aug, customers WHERE ord_aug.cust# = customers.cust#; - This query defines a view consisting of ord#, cust#, and custname using a join of ORD_AUG and CUSTOMERS tables.
Query: CREATE VIEW myview2 (ItemNo, Quantity) AS SELECT item#, qty FROM ord_items; - This query defines a view with columns item# and qty from the ORD_ITEMS table, and renames these columns as ItemNo. and Quantity respectively. Query:
CREATE VIEW myview3 AS SELECT item#, descr, price FROM items WHERE price < 1000 WITH CHECK OPTION; <------------------WITH CHECK OPTION in a CREATE VIEW statement indicates that INSERTs or UPDATEs on the view will be rejected if they violate any integrity constraint implied by the view-defining query.
- This query defines the view as defined. WITH CHECK OPTION ensures that if this view is used for updation, the updated values do not cause the row to fall outside the view. Query: DROP VIEW myview1; <---- To drop a view - this query drops the view MYVIEW1 Creating & Dropping Indexes
Query: CREATE INDEX i_city <-------------------ON customers (city); Creates a new index named i_city. The new index file(table) will have the values of city column of Customers table
Query: CREATE UNIQUE INDEX i_custname <----Creates an index which allows only unique values for
ON customers (custname); Query: CREATE INDEX i_city_custname <--------ON customers (city, custname); Query: DROP INDEX i_city; <--------------------
custnames
Creates an index based on two fields : city and custname
Drops index i_city
4.4 DCL GRANT and REVOKE statements. DCL statements are those which are used to control access permissions on the tables, indexes, views and other elements of the DBMS. Granting & Revoking Privileges Query: GRANT ALL <-----------------ON customers TO ashraf; Query: GRANT SELECT <-------------ON customers TO sunil; Query: GRANT SELECT ON customers TO sunil WITH GRANT OPTION; <-------Query: Enables user 'sunil' to give SELECT permission on customers table to other users. Grants SELECT permission on the table customers to the user 'sunil'. User 'sunil' does not have permission to insert, update, delete or perform any other operation on customers table. Grants all permissions on the table customers to the user who logs in as 'ashraf'.
REVOKE DELETE <------------ON customers FROM ashraf; 5. Recovery and Concurrency
Takes away DELETE permission on customers table from user 'ashraf'.
Recovery and Concurrency in a DBMS are part of the general topic of transaction management. Hence we shall begin the discussion by examining the fundamental notion of a transaction. 5.1 Transaction A transaction is a logical unit of work. Consider the following example: The procedure for transferring an amount of Rs. 100/- from the account of one customer to another is given.
EXEC SQL EXEC SQL
EXEC SQL
EXEC SQL UNDO: EXEC SQL FINISH: RETURN;
WHENEVER SQLERROR GOTO UNDO UPDATE DEPOSIT SET BALANCE=BALANCE-100 WHERE CUSTID=from_cust; UPDATE DEPOSIT SET BALANCE=BALANCE+100 WHERE CUSTID=to_cust: COMMIT; GOTO FINISH ROLLBACK;
Here, it has to be noted that the single operation amount transfer involves two database updates updating the record of from_cust and updating the record of to_cust. In between these two updates the database is in an inconsistent (or incorrect in this example) state. i.e., if only one of the updates is performed, one cannot say by seeing the database contents whether the amount transfer operation has been done or not. Hence to guarantee database consistency it has to be ensured that either both updates are performed or none are performed. If, after one update and before the next update, something goes wrong due to problems like a system crash, an overflow error, or a violation of an integrity constraint etc., then the first update needs to be undone. This is true with all transactions. Any transaction takes the database from one consistent state to another. It need not necessarily preserve consistency of database at all intermediate points. Hence it is important to ensure that either a transaction executes in its entirety or is totally cancelled. The set of programs which handles this forms the
transaction manager in the DBMS. The transaction manager uses COMMIT and ROLLBACK operations for ensuring atomicity of transactions. COMMIT The COMMIT operation indicates successful completion of a transaction which means that the database is in a consistent state and all updates made by the transaction can now be made permanent. If a transaction successfully commits, then the system will guarantee that its updates will be permanently installed in the database even if the system crashes immediately after the COMMIT. ROLLBACK The ROLLBACK operation indicates that the transaction has been unsuccessful which means that all updates done by the transaction till then need to be undone to bring the database back to a consistent state. To help undoing the updates once done, a system log or journal is maintained by the transaction manager. The beforeand after-images of the updated tuples are recorded in the log. The properties of transaction can be summarised as ACID properties - ACID standing for atomicity, consistency, isolation and durability. Atomicity: A transaction is atomic. Either all operations in the transaction have to be performed or none should be performed. Consistency: Transactions preserve database consistency. i.e., A transaction transforms a consistent state of the database into another without necessarily preserving consistency at all intermediate points. Isolation: Transactions are isolated from one another. i.e., A transaction's updates are concealed from all others until it commits (or rolls back). Durability: Once a transaction commits, its updates survive in the database even if there is a subsequent system crash. 5.2 Recovery from System Failures System failures (also called soft crashes) are those failures like power outage which affect all transactions in progress, but do not physically damage the database. During a system failure, the contents of the main memory are lost. Thus the contents of the database buffers which contain the updates of transactions are lost. (Note: Transactions do not directly write on to the database. The updates are written to database buffers and, at regular intervals, transferred to the database.) At restart, the system has to ensure that the ACID properties of transactions are maintained and the database remains in a consistent state. To attain this, the strategy to be followed for recovery at restart is as follows: Transactions which were in progress at the time of failure have to be undone at the time of restart. This is needed because the precise state of such a transaction which was active at the time of failure is no longer known and hence cannot be successfully completed. Transactions which had completed prior to the crash but could not get all their updates transferred from the database buffers to the physical database have to redone at the time of restart.
This recovery procedure is carried out with the help of An online logfile or journal The logfile maintains the before- and after-images of the tuples updated during a transaction. This helps in carrying out the UNDO and REDO operations as required. Typical entries made in the logfile are : Start of Transaction Marker Transaction Identifier Record Identifier Operations Performed Previous Values of Modified Data (Before-image or Undo Log) Updated Values of Modified Records (After-image or Redo Log) Commit / Rollback Transaction Marker
Taking a checkpoint at specific intervals This involves the following two operations: a) physically writing the contents of the database buffers out to the physical database. Thus during a checkpoint the updates of all transactions, including both active and committed transactions, will be written to the physical database. b)physically writing a special checkpoint record to the physical log. The checkpoint record has a list of all active transactions at the time of taking the checkpoint. 5.3 Recovery : An Example
At the time of restart, T3 and T5 must be undone and T2 and T4 must be redone. T1 does not enter the recovery procedure at all since it updates were all written to the database at time tc as part of the checkpoint proces
5.4 Concurrency Concurrency refers to multiple transactions accessing the same database at the same time. In a system which allows concurrency, some kind of control mechanism has to be in place to ensure that concurrent transactions do not interfere with each other.
Three typical problems which can occur due to concurrency are explained here. a) Lost Update Problem
(To understand the above situation, assume that there o o o is a record R, with a field, say Amt, having value 1000 before time t1. Both transactions A & B fetch this value at t1 and t2 respectively. Transaction A updates the Amt field in R to 800 at time t3. Transaction B updates the Amt field in R to 1200 at time t4.
Thus after time t4, the Amt value in record R has value 1200. Update by Transaction A at time t3 is over-written by the Transaction B at time t4.) b) Uncommitted Dependency Problem
(To understand the above situation, assume that there o o o is a record R, with a field, say Amt, having value 1000 before time t1. Transaction B fetches this value and updates it to 800 at time t1. Transaction A fetches R with Amt field value 800 at time t2. Transaction B rolls back and its update is undone at time t3. The Amt field takes the initial value 1000 during rollback.
Transaction A continues processing with Amt field value 800 without knowing about B's rollback.) c) Inconsistent Analysis Problem
5.5 Locking Locking: A solution to problems arising due to concurrency. Locking of records can be used as a concurrency control technique to prevent the above mentioned problems. A transaction acquires a lock on a record if it does not want the record values to be changed by some other transaction during a period of time. The transaction releases the lock after this time. Locks are of two types 1. shared (S lock) 2. and exclusive (X Lock). A transaction acquires a shared (read) lock on a record when it wishes to retrieve or fetch the record. An exclusive (write) lock is acquired on a record when a transaction wishes to update the record. (Here update means INSERT, UPDATE or DELETE.)
The following figure shows the Lock Compatibility matrix.
Normally, locks are implicit. A FETCH request is an implicit request for a shared lock whereas an UPDATE request is an implicit request for an exclusive lock. Explicit lock requests need to be issued if a different kind of lock is required during an operation. For example, if an X lock is to acquired before a FETCH it has to be explicitly requested for. 5.6 Deadlocks Locking can be used to solve the problems of concurrency. However, locking can also introduce the problem of deadlock as shown in the example below.
Deadlock is a situation in which two or more transactions are in a simultaneous wait state, each of them waiting for one of the others to release a lock before it can proceed. If a deadlock occurs, the system may detect it and break it. Detecting involves detecting a cycle in the Wait-For Graph (a graph which shows 'who is waiting for whom'). Breaking a
deadlock implies choosing one of the deadlocked transactions as the victim and rolling it back, thereby releasing all its locks. This may allow some other transaction(s) to proceed. Deadlock prevention can be done by not allowing any cyclic-waits. 6. Query Optimization 6.1 Overview When compared to other database systems, query optimization is a strength of the relational systems. It can be said so since relational systems by themselves do optimization to a large extent unlike the other systems which leave optimization to the programmer. Automatic optimization done by the relational systems will be much more efficient than manual optimization due to several reasons like : uniformity in optimization across programs irrespective of the programmer's expertise in optimizing the programs. system's ability to make use of the knowledge of internal conditions (eg: volume of data at the time of querying) for optimization. For the same query, such conditions may be different at different times of querying. (In a manual system, this knowledge can be utilised only if the query is re-written each time, which is not practically possible.) system's ability to evaluate large number of alternatives to find the most efficient query evaluation method.
In this chapter we shall look into the process of automatic query optimization done by the relational systems. 6.2 An Example of Query Optimization Let us look at a query being evaluated in two different ways to see the dramatic effect of query optimization. Consider the following query. Select ORDDATE, ITEM#, QTY from ORDTBL, ORD_ITEMS where ORDTBL.ORD# = ORD_ITEMS.ORD# and ITEM# = 'HW3'; Assumptions: There are 100 records in ORDTBL There are 10,000 records in ORD_ITEMS There are 50 order items with item# 'HW3'
Query Evaluation Method 1 T1 = ORDTBL X ORD_ITEMS (Perform the Product operation as the first step towards joining the two tables)
- 10000 X 100 tuple reads (1000000 tuple reads -> generates 1000000 tuples as intermediate result) - 1000000 tuples written to disk (Assuming that 1000000 tuples in the intermediate result cannot be held in the memory. 1000000 tuple writes to a temporary space in the disk.)
ORDTBL.ORD# = ORD_ITEMS.ORD# & ITEM# 'HW3'(T1) T2 = (Apply the two conditions in the query on the intermediate result obtained after the first step) - 1000000 tuples read into memory (1000000 tuple reads) - 50 selected (those tuples satisfying both the conditions. 50 held in the memory itself)
T3 = ORDDATE,ITEM#,QTY (T2) (Projection performed as the final step. No more tuple i/o s) - 50 tuples (final result) Total no. of tuple i/o s = 1000000 reads + 1000000 writes + 1000000 reads = 3000000 tuple i/o s
Query Evaluation Method 2
T1 = step)
ITEM#='HW3'
(ORD_ITEMS) (Perform the Select operation on ORD_ITEMS as the first
- 10000 tuple reads (10000 tuple reads from ORD_ITEMS) - 50 tuples selected; no disk writes (50 tuples satisfy the condition in Select. No disk writes assuming that the 50 tuples forming the intermediate result can be held in the memory) T2 = ORDTBL JOIN T1 - 100 tuple reads (100 tuple reads from ORDTBL) - resulting relation with 50 tuples
T3 = ORDDATE, ITEM#, QTY(T2) (Projection performed as the final step. No more tuple i/o s) - 50 tuples (final result) Total no. of tuple i/o s = 10000 reads + 100 reads = 10100 tuple i/o's Comparison of the two Query Evaluation Methods 10,100 tuple I/O's (of Method 2) v/s 3,000,000 tuple I/O's (of Method 1) !
Thus by sequencing the operations differently a dramatic difference can be made in the performance of queries. Here it needs to be noted that in the Method 2 of evaluation, the first operation to be performed was a 'Select' which filters out 50 tuples from the 10,000 tuples in the ORD_ITEMS table. Thus this operation causes elimination of 9950 tuples. Thus elimination in the initial steps would help optimization. Some more examples: select CITY, COUNT(*) from CUSTTBL 1. where CITY != 'BOMBAY' group by CITY; select * from ORDTBL 2. where to_char(ORDDATE,'dd-mm-yy') = '11-08-94'; select CITY, COUNT(*) from CUSTTBL group by CITY having CITY != 'BOMBAY'; select * from ORDTBL where ORDDATE = to_date('11-08-94', 'dd-mm-yy');
v/s
v/s
Here the second version is faster. In the first form of the query, a function to_char is applied on an attribute and hence needs to be evaluated for each tuple in the table. The time for this evaluation will be thus proportional to the cardinality of the relation. In the second form, a function to_date is applied on a constant and hence needs to be evaluated just once, irrespective of the cardinality of the relation. Moreover, if the attribute ORDDATE is indexed, the index will not be used in the first case, since the attribute appears in an expression and its value is not directly used. 6.3 The Query Optimization Process The steps of query optimization are explained below. a) Cast into some Internal Representation This step involves representing each SQL query into some internal representation which is more suitable for machine manipulation. The internal form typically chosen is a query tree as shown below. Query Tree for the SELECT statement discussed above:
b)Convert to Canonical Form In this second step, the optimizer makes use of some transformation laws or rules for sequencing the internal operations involved. Some examples are given below. (Note: In all these examples the second form will be more efficient irrespective of the actual data values and physical access paths that exist in the stored database. ) Rule 1: (A JOIN B) WHERE restriction_A AND restriction_B
(A WHERE restriction_A) JOIN (B WHERE restriction_B) Restrictions when applied first, cause eliminations and hence better performance. Rule 2: (A WHERE restriction_1) WHERE restriction_2
A WHERE restriction_1 AND restriction_2 Two restrictions applied as a single compound one instead applying the two individual restrictions separately. Rule 3: (A[projection_1])[projection_2]
A[projection_2] If there is a sequence of successive projections applied on the same relation, all but the last one can be ignored. i.e., The entire operation is equivalent to applying the last projection alone. Rule 4: (A[projection]) WHERE restriction
(A WHERE restriction)[projection] Restrictions when applied first, cause eliminations and hence better performance. Reference [1] gives more such general transformation laws. c)Choose Candidate Low-level Procedures In this step, the optimizer decides how to execute the transformed query. At this stage factors such as existence of indexes or other access paths, physical clustering of records, distribution of data values etc. are considered. The basic strategy here is to consider the query expression as a set of low-level implementation procedures predefined for each operation. For eg., there will be a set of procedures for implementing the restriction operation: one (say, procedure 'a') for the case where the restriction attribute is indexed, one (say, procedure 'b') where the restriction attribute is hashed and so on. Each such procedure has and associated cost measure indicating the cost, typically in terms of disk I/Os. The optimizer chooses one or more candidate procedures for each low-level operations in the query. The information about the current state of the database (existence of indexes, current cardinalities etc.) which is available from the system catalog will be used to make this choice of candidate procedures. d)Generate Query Plans and Choose the Cheapest In this last step, query plans are generated by combining a set of candidate implementation procedures. This can be explained with the following example(A trivial one but illustrative enough). Assume that there is a query expression comprising a restriction, a join and a projection. Some examples, of implementation procedures available for each of these operations can be assumed as given in the table below. Implementation Procedure a
Operation Restriction
Condition Existing Restriction attribute is indexed
Restriction Restriction Join Join Projection Projection
Restriction attribute is hashed Restriction attribute is neither indexed nor hashed
b c d e f g
Now the various query plans for the original query expression can be generated by making permutations of implementation procedures available for different operations. Thus the query plans can be adf - adg aef aeg bdf ... ... It has to be noted that in reality, the number of such query plans possible can be too many and hence generating all such plans and then choosing the cheapest will be expensive by itself. Hence a heuristic reduction of search space rather than exhaustive search needs to be done. Considering the above example, one such heuristic method can be as follows: If the system knows that the restriction attribute is neither indexed nor hashed, then the query plans involving implementation procedure 'c ' alone (and not 'a' and 'b') need to be considered and the cheapest plan can be chosen from the reduced set of query plans. 6.4 Query Optimization in Oracle Some of the query optimization measures used in Oracle are the following: Indexes unnecessary for small tables. i.e., if the size of the actual data record is not much larger than the index record, the search time in the index table and the data table will be comparable. Hence indexes will not make much difference in the performance of queries. Indexes/clusters when retrieving less than 25% of rows. The overhead of searching in the index file will be more when retrieving more rows. Multiple column WHERE clauses evaluations causing largest number of eliminations performed first
JOIN-columns should be indexed. JOIN columns or Foreign Key columns may be indexed since queries based on these columns can be expected to be very frequent. Index not used in queries containing NULL / NOT NULL. Index tables will not have NULL / NOT NULL entries. Hence need not search for these in the index table. Suggested References: 1.Date C. J., An Introduction to Database Systems, 7th edition, Addison-Wesley, 2000. 2.Korth H. F. & Silberschatz A., Database System Concepts, 2nd edition, McGraw-Hill (Year)
Object Oriented Concepts

1. Introduction Improvement of programmer productivity has been a primary preoccupation of the software industry since the early days of computers. Hence it is no surprise that it continues to be a dominant theme even today.However, it should be noted that concerns arising out of this preoccupation have changed with times. As software engineering and software technology evolved, the concerns have changed from productivity during software development to reliability of software. Hence the emphasis on extensibility and reusability of software modules. Its perceived benefits are: reduction in the effort needed to ensure reliability of software, and improved productivities of the software development process
Software being inherently complex, the time tested technique of decomposition also known as "DIVIDE and CONQUER" has been applied from the early days of software development. Two types of decomposition have been used, namely, algorithmic decomposition, and object-oriented decomposition. The algorithmic decomposition views software as Program={Data Structures}+{Operations} The program is viewed as a series of tasks to be carried out to solve the problem. The object-oriented decomposition views it as Object={Data Structures}+{Operations} Program={Objects} The program is viewed as a set of objects, which cooperate with each other to solve the problem. For example, consider a typical banking system. Algorithmic decomposition views it as a series of tasks, not unlike the following: Open an account Deposit money Withdraw money Transfer money between accounts Close an account ...etc...
The object-oriented decomposition views it as a set of objects, like the following: Account object o account number (data) o current balance (data) o get account number (operation) o update balance (operation) Account holder object o name (data) o address (data) o deposit money (operation) o withdraw money (operation) ...etc...
2. How object oriented decomposition helps software productivity Data and operations are packaged together into objects. This facilitates independent and parallel development of code. Because of packaging most updates are localised. This makes it easier to maintain OO systems. It allows reuse and extension of behaviour. Permits iterative development of large systems
Basically, these gains arise from the dominant themes in the OO paradigm. They are Data Abstaction: This provides a means by which a designer or programmer can focus on the essential features of her data. It avoids unnecessary attention being paid to implementation of data. Encapsulation: This helps in controling the visibility of internal details of the objects. It improves security and integrity of data Hierarchy: It is the basic means to provide extensibility of software modules. and helps in increasing the reuse of modules. Polymorphism: It is the means by which an operation behaves differently in different contexts. These themes are discussed further below against the backdrop of significant periods in the history of software engineering methodologies. 3. Historical Perspective After early experience in software development, the software field hoped that structured programming ideas would provide solutions to its problems. Structured programming is a style of programming usually associated with languages such as C, Fortran, Pascal and so on. Using structured programming techniques, a problem is often solved using a divide and conquer approach. An initially large problem is broken into several smaller subproblems. Each of these is then progressively broken into even smaller sub-problems, until the level of difficulty is considered to be manageable. At the lowest level, a solution is implemented in terms of data structures and procedures. This approach is often used with imperative programming languages that are not object-oriented languages, i.e. the data structures and procedures are not implemented as classes. It was looked upon as a technique to make programs easier to understand, modify, and extend. It was felt that use of the most appropriate control structures and data structures would make a program easy to understand. It was realised that branching (use of goto's) makes a program unstructured. It was felt that a program should be modular in nature. The language PL/I was designed around this time. It is a language rich in data structures and control structures and promotes modularity. However, ideas of structured programming were devoid of a methodology. Given a program one could apply the definition of structuredness and say whether it was a structured program or not. But one did not know how to design a structured program. Similarly, guidelines for developing good modular structure for a program were conspicuously absent. In the absence of a methodology, a novice could obtain absurd results starting from a problem specification.
For example, the rule that each module should be about 50 lines of code can be used to decide that each segment of 50 lines in a program should become a module. Around this time it was also felt that having to use a set of predefined data types of the language was very constraining, and hence prone to design errors/program bugs. So a methodology for program design was necessary. It should make the task of program design simple and systematic. provide effective means to control complexity of program design provide easy means to help a user design his data
The PASCAL language was designed with a rich control structure, and the means to support user defined data. The designer of PASCAL, Niklaus Wirth, also popularised the methodology of stepwise refinement. It can be summarised as follows: 1. Write a clear statement of the problem in simple English. Avoid going to great depth. Also focus on what is to be done, not on how it is to be done. 2. Identify the fundamental data structures involved in the problem. This data belongs to the problem domain. 3. Identify the fundamental operations which, when performed on the data defined in the above step, would implement the problem specification. These operations are also domain specific. 4. Design the data identified in step 2 above. 5. Design the operations identified in step 3 above. If any of these operations are composed of other operations apply steps 1-4 to each such operation. Application of this methodology in a recursive manner leads to a hierarchical program structure. An operation being defined becomes a 'problem' to be solved at a lower level leading to the identification of its own data and sub-operations. When a clear and unambiguous statement of the problem is written in simple English in the above stated manner the data gets identified in the abstract form. Only the features of the problem get highlighted. Other features are suppressed. similarly, the essential features of an operation get highlighted such data and operations are said to be 'domain-specific'. They are directly meaningful in the problem domain use of domain-specific data and operations simplifies program design. It is more natural than use of data or operations which have more or less generality than needed. It is also less prone to design or programming errors.
PASCAL permitted the programmer to define her own data types. These could be Simple data, viz. enumerated data types. Structured data, viz. array and record types Operations on user defined data were coded as procedures and functions. Thus, a user could define the set of values a data item could take, and could also define how the values were to be manipulated. The point to be noted here is that permitting a user to define her own data permits her to use most appropriate data for a problem. For example, an inventory control program could start like this: type code = {issue, receipt} var trans_code : code;
begin ... if trans_code = issue then ... Advantage of this is that programming is now more natural, and hence less prone to errors. PASCAL was also the first widely used language to emphasize the importance of compile-time validation of a program. It implies the following: Syntax errors should be detected by a compiler. Most compilers do this. Violations of language semantics should also be detected by a compiler. The semantic checks should also apply to user-defined data.
Thus, invalid use of data should be detected by the compiler. This will eliminate the debugging effort which would otherwise be required to detect and correct the error. PASCAL does not fully succeed in implementing compile-time validation. However, stressing its importance is one of its achievements. PASCAL lacks features to define legal operations on user defined data and to ensure that only such operations are used on the data. This endangers data consistency. For example: type complex_conjugate = record real1 : real; imag1 : real; real2 : real; imag2 : real; end; begin ... real1 := real1+3; {this violates the idea of complex conjugates} The user defined data is like any other data and hence its components can be used in any manner consistent with the type of the components. Hence complex conjugates can only be used in some specific manner, but this is not known to the compiler. It will allow its component real1 to be used as a real variable. SIMULA was the first language, which introduced the concept of classes. Infact, it is the first language which introduced many of the object oriented concepts, which are taken for granted today. A class in SIMULA is a linguistic unit which contains: Definition of the legal values that data of the class can assume. Definition of the operations that can be performed on the data.
A user can create variables of a class in her program, and can use the operations defined in the class on these variables. This way the compiler knows what operations are legal on the data. Attempts to use any other operations will lead to compile-time errors. However, it did not provide encapsulation. Encapsulation implies 'sealing' of the internal details of data and operations. Use of encapsulation in the class promotes information hiding. Only those data and operations, which are explicitly declared as visible to the outside world, will be visible and the rest will be hidden. Encapsulation implies compile-time validation. This would ensure that a program cannot perform any invalid operation on the data. Thus, illegal access to the imaginary part of a complex number can be prevented. This takes compile-time validation of a program one step further. (At this point you are encouraged to visit these links [1], [2]
Data abstraction coupled with encapsulation provides considerable advantages in the presence of modularity. Consider a program consisting of two modules A and B. We can make the following observations: Module B defines its own data. The data of module B can only be manipulated by its operations, also defined in module B. Hence, module A cannot access the data of module B directly. The responsibility for correct manipulation of the data now rests only with module B. Chnages made in module B do not affect module A. This simplifies debugging.
Now we are in a position to clarify the expressions: Object={Data structures}+{Operations} Program={Objects} An object is a collection of data items and operations that manipulate data items. A object oriented program is a collection of objects which interact with each other.
Each object is an instance of a class. A class defines a 'data type'. Each object is a variable of a 'data type'. Thus a class is a template and objects are its 'copies' The data structures are declared in the class. The operations are also defined in the class. Each operation is called a method. Each object of the class contains a copy of the data structures declared in the class. These are called instance variables of the object. The operations in all objects of a class share the code of the class methods. Thus a single copy of the code exists in the program.
In most present-day object oriented systems, the execution of an object oriented program is 'single threaded', i.e. only one object is active at a time. The program consists of one 'main' object which is active when the program is initiated.
4. Abstraction Abstraction provides a well-defined conceptual boundary relative to the perspective of the viewer. When we define abstract data, we identify their essential characteristics. All other characteristics are unimportant. There are different types of abstraction: Entity abstraction: The object presents a useful model of an entity in the problemdomain Action abstraction: The object provides a generalised set of operations, all of which perform the same kind of function. Virtual machine abstraction: The object groups together operations that are all used by some superior level of control Coincidental abstraction: The object packages operations that have no relation to each other.
Entity abstraction is considered to be the best form of abstraction as it groups together the data and operations concerning an entity. For example, an object called salaried_employee could contain operations like compute_income_tax, compute_pf, etc. Thus information about an entity is localized. One often faces the question as to how to identify useful objects from a problem specification. The answer is look for nouns in the specification. They represent 'things' or entities. This leads to entity abstraction. 5. Encapsulation While abstraction helps one to focus on the essential characteristics of an object, encapsulation enables one to expose only those details that are necessary to effectively use the object. This is achieved through information hiding and "need to know" principle. One could declare some of the variables and methods of the object as private and the rest public. Only public variables and methods will be accessible by other objects. By carefully selecting what is available to the outside world, the developer can prevent illegal use of objects. For example, consider a class called Person with an operation like 'compute_income_tax'. Details of a person's income are not essential to a viewer of this class. Only the ability to compute the income-tax for each object is essential. Hence the data members of the class can be declared private, while 'compute_income_tax' can be public. 6. Hierarchy Hierarchy is a ranking or ordering of abstractions. The important hierarchies in an OO system are: The class structure hierarchy The object structure hierarchy
A class structure hierarchy is used for sharing of behaviour and sharing of code between different classes of entities which have some features in common. This is achieved through inheritance. It is called the 'is a' hierarchy. Features that are common to many classes are migrated to a common class, called base-class, (or super-class, or parent-class). Other classes may add, modify, or even hide some of these features. These are called derivedclasses, (or sub-classes, or child-classes). For example, vehicle can be a base-class and two-wheeler, three-wheeler, and four-wheeler can be derived classes. There is also a possibility of inheriting from more than one class. This is called multiple inheritance. For example, a two-in-one inherits the features of a radio and a tape-recorder. However, it can raise the following complications: Possibility of name clashes Repeated inheritance from two peer super-classes
The object structure hierarchy is also called 'part of' hierarchy. It is implemented through aggregation, that is one object becomes a part of another object. Aggregation permits grouping of logically related structures. It is not unique to object oriented systems. For example, in C a structure can be used to group logically related data elements and structures. In OO context, a vehicle consists of many parts like engine, wheels, chasis, etc. Hence it can be represented as an aggregation of many parts. 7. Polymorphism
It is a concept wherein a name may denote instances of different classes as long as they are related by some common super class. Any such name is thus able to respond to some common set of operations in different ways. For example, let us say that we have declared a class called polygon with a function called draw(). We have derived three classes, rectangle, triangle, pentagon, each with its own redefinition of the draw() function. The version of the draw function to be used during runtime is decided by the object through which it is called. This is polymorphism. It assists in adopting a unified design approach. 8. Modularity Modularity creates a number of well-defined, documented boundaries within a system. A module typically clusters logically related abstractions. This is invaluable in understanding a system. Each module can be compiled separately, but has connections with other modules. Connections between modules are the assumptions which modules make about each other. Ideally, a module should be a single syntactic construct in the programming language.
9. Meyer's Criteria Bertrand Meyer suggests five criteria for evaluating a design method's ability to achieve modularity and relates these to object oriented design: Decomposability: Ability to decompose a large problem into subproblems. Composability: Degree to which modules once designed and built can be reused to create other systems. Understandability: Ease of understanding the program components by themselves, i.e. without refering to other components. Continuity: Ability to make incremental changes. Protection: A characteristic that reduces the propagation of side effects of an error in one module.
Based on the foregoing criteria, Meyer suggests five rules to be followed to ensure modularity: Direct mapping: Modular structure of the software system should be compatible with modular structure devised in the process of modeling the problem domain Few interfaces: Minimum number of interfaces between modules. Small interfaces: Minimum amount of information should move across an interface. Explicit interfaces: Interfaces should be explicit. Communication through global variables violates this criterion. Information hiding: Information concerning implementation is hidden from the rest of the program.
Based on the above rules and criteria, the following five design principles follow: Linguistic modular units principle: Modules must correspond to syntactic units in the language used. Self-documentation principle: All information about a module should be part of the module itself.
Uniform access principle: All services offered by a module should be available through a uniform notation Open-closed principle: Modules should be both open and closed. That is, it should be possible to extend a module, while it is in use. Single choice principle: Whenever a software system must support a set of alternatives, one and only one module in the system should know their exhaustive list.
A class in an OO language provides a linguistic modular unit. A class provides explicit interfaces and information hiding. This is what sets OO languages apart from early languages like PASCAL and SIMULA. 10. Characteristics of an object From the perspective of human congnition, an object is one of the following: A tangible and/or visible thing. Something that may be apprehended intellectually. Something towards which thought or action is diected.
Some objects may have clear physical identities(e.g.machine), while some others may be intangible with crisp conceptual boundaries(e.g. a chemical process). However, the following statement is true for all objects: An object has state, behaviour, and identity. An object's behaviour is governed by its state. For example, a two-in-one cannot operate as a radio, when it is in tape-recorder mode. An operation is a service that an object offers to the rest of the system. There are five kinds of service: Modifier: The operation alters the state of the object. Selector: The operation accesses the state of an object but does not alter it. Iterator: The operation accesses parts of an object in some order. Constructor: The operation creates an object and initialises its state. Destructor: The operation destroys the object
There are three categories of objects: Actor: The object acts upon other objects but is never acted upon. Server: The object is always acted upon by other objects. Agent: The object acts on other objects and is also acted upon by others.
There are three types of visibility in to a class. Public: The feature that is declared public is accessible to the class itself and all its clients. Protected: The feature that is declared protected is accessible to the class itself and all its derived classes and friends Private: The feature that is declared private is accessible only to the class itself and its friends.
11. Extensibility The notions of abstraction and encapsulation provide us with features of understandability,
continuity, and protection. As we said in the beginning, one of the goals of OO paradigm is 'extensibility'. It can be done in two ways: extension by scaling and semantic extension. Consider a banking application which supports two 'banking stations'. What changes are required to support additional stations? Let us say the application contains a class called BSTATION. Each banking station is an instance of this class. To incorporate the extension o We create another instance of BSTATION. This instance will share the code for the methods with other instances, but will have its own copies of the data. o Use of the new station is integrated in the software, i.e. provision is made to direct some workload at the new station.
This is extension by scaling. It is made simple by the notion of a class. We can achive scaling by merely declaring another variable of a class. Semantic extension may involve Changing the behaviour of existing entities in an application. Defining new kinds of behaviour for an entity. For example, We may want to compute the average age of students in years, months and days. We may want to compute the average marks of students. Let us say these requirements necessitate changes to the entity 'Persons'. The classical process is to declare that the entity is under maintenance. This implies: Applications using 'Persons' must be held in abeyance. They cannot be used until the maintenance is complete. 'Persons' must then be modified. It must then be extensively tested. If necessary, each application using 'Persons' should be retested. Now existing applications using 'Persons' can be resumed. New applications using 'Persons' can be developed, tested, etc.
Thus, this form of maintenance for extension is obtrusive. It interrupts the running applications. Avoiding obtrusion leads to different versions of software. Semantic extension using object oriented methodolgy does not suffer from this problem. Applications using an entity are not affected by the extension and can continue to run while the entity is being extended. Such applications need not be retested nor revalidated. This is achieved through the notion of inheritance. 12. Conclusions Thus object-oriented approach has many advantages over the structured methodology. There are many languages which support object-oriented programming. Some of the popular ones are, Smalltalk, Eiffel, Ada, C++, Java. The choice of the language will usually be based on business requirements. Object-oriented approach itself is most suitable in business areas which are Rapidly changing Require fast response to changes Complex in requirements or implementation Repeated across company with variations Having long-life applications that must evolve
13. References "Object-oriented analysis and design with applications", by Grady Booch, AddisonWesley "Object-oriented software construction", by Bertrand Meyer, Prentice-Hall "Object-oriented modeling and design", by James Rumbagh, et.al., Prentice-Hall of India "Object-oriented software engineering, a use case driven approach", by Ivor Jacobson, Addison-Wesley Object oriented concepts Introduction to object oriented programming using C++ An introduction to design by contract
OOAD using UML

1. Modeling 1.1 What is Modeling ? A software development method consists of a modeling language and a process. The Unified Modeling Language (UML) is called a modeling language, not a method. The modeling language is the notation that methods use to express designs. The process describes the steps taken in doing a design. Models provide various perspectives, when put together will provide an overall view of the system. While creating a model, for a given level of abstraction, it is to be decided which elements are to be included and which are to be excluded. Some type of notation represents models visually. The notation often takes the form of graphical symbols and connections. Models are built so that we can better understand the system that we are developing. Models in software help us to visualize, specify, construct and document the artifacts of a software intensive system. 1.2 Principles of Modeling The choice of the right model is extremely important since it decides the way in which we deal with the problem The levels of precision for each model vary The best models are connected to reality A single model may not be able to represent all the details of a system. A system can be represented by a set of independent models. 1.3 What is Business Modeling? UML is not only used for modeling system software but also for modeling the business process. Business modeling is a technique to model business processes. Business models provide ways of expressing the business processes in terms of business activities and collaborative behavior. Business modeling is a technique which will help in finding out whether we have identified all the system use cases as well as determining the business value of the system. Why Business Modeling? The system can provide value only if we know how it will be used, who will use it and in what circumstances it will be used. To ensure that customer-oriented solutions are built, we must not overlook the environment in which these systems will work the roles and responsibilities of the employees using the system the "things" that are handled by the business, as a basis for building the system
One of the great benefits of business modeling is to elicit better system requirements, requirements that will drive the creation of information systems that actually fit in the organization and that will indeed be used by end-users.
2. Unified Process 2.1 What is Unified Process? A process is a set of activities intended to reach a goal. The inputs to the software process are the needs of the business and the output will be the software product. We can use UML with a number of software engineering processes. The Unified Process is one such lifecycle approach well-suited to the UML. The goals of the Unified Process are to enable the production of highest quality software that meets end-user needs with predictable schedules and budgets. The Unified Process captures some of the best current software development practices in a form that is tailorable for a wide range of projects and organizations. On the management side, the Unified Process provides a disciplined approach on how to assign tasks and responsibilities within a software development organization. 2.2 Unified Process - Features The Unified Process captures many of modern software development's best practices in a form suitable for a wide range of projects and organizations: Develop software iteratively and incrementally. Manage requirements. Use component-based architectures. The Unified Process is iterative, incremental, architecture-centric, use-case driven. a) Iterative The development is iterative and involves a sequence of steps and each iteration adds some new information. Each iteration is evaluated and used to produce input for the next iteration. Thus, this process provides continuous feedback that improves the final product. b) Incremental The iterations are incremental in function. Each iteration builds on the use cases developed in the previous iterations. c) Architecture-Centric The importance of well defined basic system architecture is realized and is established in the initial stage of the process. The architectural blue print serves as a solid basis against which to plan and manage software component based development. d) Use Case driven
Development activities under the Unified Process are use case driven. The Unified process places strong emphasis on building systems based on a thorough understanding of how the delivered system will be used. 2.3 A brief outline of a typical development process A typical OO development process is iterative and incremental. It has the following stages: Inception Elaboration Construction Transition
A software system is not released in a big bang at the end of the project but is developed and released in phases. The construction phase typically has many iterations , in which each iteration builds production quality software, (tested and integrated) that satisfies a subset of project requirements.
2.3.1 Inception phase During the inception phase, we develop business model for the project. Determine roughly how much it will cost and how much it will earn. The feasibility study is performed as well as the overall scope and size of the project is determined during the inception phase. The actors of the system and their interaction with the system are analyzed at a high level. Identifying all use cases and describing important use cases also happens at this stage. Inception can take many forms. It may be as informal as a chat in the cafeteria, or a fullfledged feasibility study that takes weeks. At the end of the Inception stage, the following objectives are to be achieved: Concurrence on the scope of the project and the estimates Understanding of the requirements 2.3.2 Elaboration phase In the elaboration phase, a baseline architecture is established, the project plan is developed and risk assessment is also performed. The major types of risks are Requirements Risks Technological Risks Skills Risks Political Risks
Dealing with Requirements Risks Use cases can be employed to provide basis for communication between customer and developers in planning a project Use case is a typical interaction between user and system to achieve a goal. E.g. as we use ATM one use case will be - Dispense cash, another may be print receipt, display balance and so on. Skeleton Domain models can be drawn using OO techniques like UML Considerations for Workflow, Rulebase, Security need to made
Dealing with Technological Risks Build prototypes that try out pieces of technology that we plan to use. e.g. if we are using Java and relational database, build a simple application using these. Try out tools that support our choice of technology. Spend time getting comfortable with them. Also, consider how easy or difficult it is to port to other platforms in future.
Dealing with Skills Risks Acquire skills through training and mentoring
Read relevant technical books Look for patterns o Pattern is an idea that has been useful in one practical context and may probably be useful in others o In the modeling context, they can be looked upon as example models o They describe common ways of doing things o They are collected by people who spot repeating themes in analysis and design o Each theme is described so that people can read it and see how to apply it. Thus, patterns promote reuse.
Dealing with Political Risks Training on Effective Team work can be of great help here. The cost and schedule estimates can be comfortably made at this stage and create a plan for construction.
Planning the Construction phase Have our customer categorize use cases into high, medium and low business value Have the developers categorize use cases into high, medium and low risks- these would reflect level of difficulty, impact on system design, lack of adequate understanding ... Assuming a fully committed developer, estimate length of time required for each use case- include analysis, design, coding unit testing, integration and documentation. Determine our iteration length- an iteration should be long enough for us to do several use cases. Estimate effort involved for each iteration- apply load factors - e.g. we will never have a developer with no distractions - so difference between ideal time and reality will be the load factor. Determine our project velocity - how fast can we go. This is how much development can we do in an iteration. o e.g. if we have 6 developers, 3 week iteration length and a load factor of 2, we will have 9 ideal developer-weeks per iteration Deal with Use cases with high risk first Have a release plan ready Each iteration is a mini-project in itself. It should end with system tests to confirm that the use cases have been built correctly and a demo to the user Each iteration builds on the use cases developed in the previous iterations- it is incremental in nature Each iteration will involve rewriting some existing code to make it more flexible - it is iterative in nature Use Refactoring in iterating code Test extensively
Refactoring Software Entropy: The principle of software entropy suggests that programs start off in a well designed state, but as bits of functionality are added, they lose their structure and deform into a mass of spaghetti! Refactoring is a technique used to reduce the short term pain of redesigning
It involves small steps like renaming a method, consolidating similar methods into a superclass and the like. Each step is tiny but performing these steps can make a remarkable difference to the program. Test after each such step Do not add new functionality and refactor at the same time
At the end of the elaboration The use case model should be complete Nonfunctional Requirements should be elaborated Software Architecture should be described Revised risk list should be present A preliminary user manual (optional)
2.3.3 Construction phase All the components are developed and the components are integrated during the construction phase. All the features are completely tested during this stage. Resources are managed and operations are controlled to optimize cost, schedule and quality. The construction phase is incremental and iterative. It is incremental because each iteration builds on the code developed in the previous iterations. It is iterative since each iteration involves rewriting some existing code to make it more flexible. Refactoring is done after every iteration. At the end of the construction, The product should be stable and mature for release Actual versus planned expenditure should be acceptable
2.3.4 Transition phase The objective of this phase is to transition the software product to the user community. Developing new releases, correcting defects and optimization of the software are part of this phase. The activities in this phase include User Training Conversion of Operational databases Roll out the product to marketing and sales
The objectives of the transition phase are Customer Satisfaction Achieving the concurrence of the stakeholders that the deployment baselines are complete and consistent with the evaluation criteria Achieving final product baseline rapidly in a cost effective manner
3. Introduction to UML *
The UML may be used to visualize, specify, construct and document the artifacts of a software-intensive system. The UML is only a language and so is just one part of a software development method. The UML is process independent, although optimally it should be used in a process that is use case driven, architecture-centric, iterative, and incremental. 3.1 What is UML The Unified Modeling Language (UML) is a robust notation that we can use to build OOAD models. It is called so, since it was the unification, of the ideas from different methodologies from three amigos Booch, Rumbaugh and Jacobson. The UML is the standard language for visualizing, specifying, constructing, and documenting the artifacts of a software-intensive system. It can be used with all processes, throughout the development life cycle, and across different implementation technologies. The UML combines the best from Data Modeling concepts Business Modeling (work flow) Object Modeling Component Modeling
The UML may be used to Display the boundary of a system and its major functions using use cases and actors Illustrate use case realizations with interaction diagrams Represent a static structure of a system using class diagrams Model the behavior of objects with state transition diagrams Reveal the physical implementation architecture with component & deployment diagrams Extend the functionality of the system with stereotypes
* This material is not in compliance with UML 2.0 3.2 Evolution of UML One of the methods was the Object Modeling Technique (OMT), devised by James Rumbaugh and others at General Electric. It consists of a series of models - use case, object, dynamic, and functional that combines to give a full view of a system. The Booch method was devised by Grady Booch and developed the practice of analyzing a system as a series of views. It emphasizes analyzing the system from both a macro development view and micro development view and it was accompanied by a very detailed notation. The Object-Oriented Software Engineering (OOSE) method was devised by Ivar Jacobson and focused on the analysis of system behavior. It advocated that at each stage of the process there should be a check to see that the requirements of the user were being met.
3.3 Why Unification Each of these methods had their strong points and their weak points. Each had its own notation and its own tools. This made it very difficult for developers to choose the method and notation that suited them and to use it successfully. New versions of some of the methods were created, each drawing on strengths of the others to augment their weaker aspects. This led to a growing similarity between the methods and hence the Unification. 3.4 The UML is composed of three different parts: Model elements Diagrams Views
Model Elements The model elements represent basic object-oriented concepts such as classes, objects, and relationships. Each model element has a corresponding graphical symbol to represent it in the diagrams.
Diagrams Diagrams portray different combinations of model elements. For example, the class diagram represents a group of classes and the relationships, such as association and inheritance, between them. The UML provides nine types of diagram - use case, class, object, state chart, sequence, collaboration, activity, component, and deployment
Views Views provide the highest level of abstraction for analyzing the system. Each view is an aspect of the system that is abstracted to a number of related UML diagrams. Taken together, the views of a system provide a picture of the system in its entirety. In the UML, the five main views of the system are o User o Structural o Behavioral o Implementation o Environment
In addition to model elements, diagrams, and views, the UML provides mechanisms for adding comments, information, or semantics to diagrams. And it provides mechanisms to adapt or extend itself to a particular method, software system, or organization. Extensions of UML Stereotypes can be used to extend the UML notational elements Stereotypes may be used to classify and extend associations, inheritance relationships, classes, and components
Examples Class stereotypes: boundary, control, entity Inheritance stereotypes: extend Dependency stereotypes: uses Component stereotypes: subsystem
3.5 UML Diagrams 3.5.1 Use Case Diagrams The use case diagram presents an outside view of the system The use-case model consists of use-case diagrams. The use-case diagrams illustrate the actors, the use cases, and their relationships. Use cases also require a textual description (use case specification), as the visual diagrams can't contain all of the information that is necessary.
The customers, the end-users, the domain experts, and the developers all have an input into the development of the use-case model. Creating a use-case model involves the following steps: 1. defining the system 2. identifying the actors and the use cases 3. describing the use cases 4. defining the relationships between use cases and actors. 5. defining the relationships between the use cases
Use Case: Definition: is a sequence of actions a system performs that yields an observable result of value to a particular actor.
Use Case Naming: Use Case should always be named in business terms, picking the words from the vocabulary of the particular domain for which we are modeling the system. It should be meaningful to the user because use case analysis is always done from users perspective. It will be usually verbs or short verb phrases. Use Case Specification shall document the following Brief Description Precondition Main Flow Alternate Flow Exceptional flows Post Condition Special Requirements
Notation of use case
Actor: Definition: someone or something outside the system that interacts with the system An actor is external - it is not actually part of what we are building, but an interface needed to support or use it. It represents anything that interacts with the system. Notation for Actor
Relation: Two important types of relation used in Use Case Diagram Include An Include relationship shows behavior that is common to one or more use cases (Mandatory) Include relation results when we extract the common sub flows and make it an use case Extend An extend relationship shows optional behavior (Optional) Extend relation results usually when we add a bit more specialized feature to the already existing one, we say the use case B extends its functionality to use case A.
A system boundary rectangle separates the clinic system from the external actors. An extend relationship indicates that one use case is a variation of another. Extend notation is a dotted line, labeled <<extend>>, and with an arrow toward the base case. The extension point, which determines when the extended case is appropriate, is written inside the base case. 3.5.2 Class Diagrams Class diagram shows the existence of classes and their relationships in the structural view of a system. UML modeling elements in class diagrams Classes and their structure and behavior Relationships o Association o Aggregation o Composition o Dependency o Generalization / Specialization ( inheritance relationships) o Multiplicity and navigation indicators o Role names
A class describes properties and behavior of a type of object. Classes are found by examining the objects in sequence and collaboration diagram A class is drawn as a rectangle with three compartments Classes should be named using the vocabulary of the domain Naming standards should be created e.g., all classes are singular nouns starting with a capital letter
The behavior of a class is represented by its operations. Operations may be found by examining interaction diagrams The structure of a class is represented by its attributes Attributes may be found by examining class definitions, the problem requirements, and by applying domain knowledge
Notation:
Class information: visibility and scope The class notation is a 3-piece rectangle with the class name, attributes, and operations. Attributes and operations can be labeled according to access and scope. Here is a new, expanded Order class. Relationship - Association Association represents the physical or conceptual connection between two or more objects An association is a bi-directional connection between classes An association is shown as a line connecting the related classes
- Aggregation An aggregation is a stronger form of relationship where the relationship is between a whole and its parts It is entirely conceptual and does nothing more than distinguishing a whole from the part. It doesnt link the lifetime of the whole and its parts An aggregation is shown as a line connecting the related classes with a diamond next to the class representing the whole
- Composition Composition is a form of aggregation with strong ownership and coincident lifetime as the part of the whole.
- Multiplicity Multiplicity defines how many objects participate in relationships Multiplicity is the number of instances of one class related to ONE instance of the other class
This table gives the most common multiplicities. Multiplicities Meaning 0..1 0..* or * 1 1..* zero or one instance. The notation n . . m indicates n to m instances no limit on the number of instances (including none) exactly one instance at least one instance
For each association and aggregation, there are two multiplicity decisions to make: one for each end of the relationship Although associations and aggregations are bi-directional by default, it is often desirable to restrict navigation to one direction
If navigation is restricted, an arrowhead is added to indicate the direction of the navigation
- Dependency A dependency relationship is a weaker form of relationship showing a relationship between a client and a supplier where the client does not have semantic knowledge of the supplier A dependency is shown as a dashed line pointing from the client to the supplier
- Generalization: is a relationship between a general thing (called the super class or the parent) and a more specific kind of that thing (called the subclass(es) or the child).
- An association class is an association that also has class properties(or a class has association properties)
- A constraint is a semantic relationship among model elements that specifies conditions and propositions that must be maintained as true: otherwise, the system described by the model is invalid
- An interface is a specifier for the externally visible operations of a class without specification of internal structure. An interface is formally equivalent to abstract class with no attributes and methods, only abstract operations.
- A qualifier is an attribute or set of attributes whose values serve to partition the set of instances associated with an instance across an association.
3.5.3 Interaction Diagrams Interaction diagrams is used to model the dynamic behavior of the system Interaction diagram helps us to identify the classes and its methods Interaction diagrams describe how use cases are realized as interactions among objects Show classes, objects, actors and messages between them to achieve the functionality of a Use Case
There are two types of interaction diagrams 1. Sequence Diagram 2. Collaboration Diagram Sequence diagram: show the interaction of objects with respect to time Sequence diagrams have two axes horizontal axis represents the objects involved in a sequence. vertical axis represents the passage of time
The following sequence diagram realizes a scenario of reserving a copy of book in a library.
Collaboration diagram: shows the interaction of the objects and also the group of all messages sent or received by an object. This allows us to see the complete set of services that an object must provide. The following collaboration diagram realizes a scenario of reserving a copy of book in a library
Difference between the Sequence Diagram and Collaboration Diagram Sequence diagrams: emphasize the temporal aspect of a scenario - they focus on time. Collaboration diagrams: emphasize the spatial aspect of a scenario - they focus on how objects are linked.
3.5.4 Activity Diagram An activity diagram is essentially a fancy flowchart. Activity diagrams and statechart diagrams are related. While a statechart diagram focuses attention on an object undergoing a process (or on a process as an object), an activity diagram focuses on the flow of activities involved in a single process. The activity diagram shows the how those activities depend on one another. Activity diagrams can be divided into object swimlanes that determine which object is responsible for which activity. A single transition comes out of each activity, connecting it to the next activity. A transition may branch into two or more mutually exclusive transitions. Guard expressions (inside [ ]) label the transitions coming out of a branch. A branch and its subsequent merge marking the end of the branch appear in the diagram as hollow diamonds. A transition may fork into two or more parallel activities. The fork and the subsequent join of the threads coming out of the fork appear in the diagram as solid bars. The activity diagram for reserving a book (in the Library Management System) is shown below:
3.5.5 State Chart Diagram A state chart (transition) diagram is used to show The life history of a given class, usecase, operation The events that cause a transition from one state to another The actions that result from a state change
State transition diagrams are created for objects with significant dynamic behavior (More Contents are to be added)
3.5.6 Component Diagram Describe organization and dependency between the software implementation components.
Components are distributable physical units - e.g. source code, object code.
3.5.7 Deployment Diagram Describe the configuration of processing resource elements and the mapping of software implementation components onto them. Contain components - e.g. object code, source code and nodes (e.g. printer, database, client machine)
4. Suggested References 1. The Unified Modeling Language User Guide (Authors- Grady Booch, James Rumbaug, Ivar Jacobson) 2. UML distilled (Author -Martin Fowler) 3. UML in a Nutshell (Author- Sinan Si Alhir) 4. The Elements of UML Style (Author Scott W. Ambler) 5. http://www.omg.org/technology/documents/formal/uml.htm
Requirements Engineering 1. Introduction The objectives of this module are To establish the importance / relevance of requirement specifications in software development To bring out the problems involved in specifying requirements To illustrate the use of modelling techniques to minimise problems in specifying requirements
Requirements can be defined as follows: A condition or capability needed by a user to solve a problem or achieve an objective. A condition or capability that must be met or possessed by a system to satisfy a contract, standard, specification, or other formally imposed document.
At a high level, requirements can be classified as user/client requirements and software requirements. Client requirements are usually stated in terms of business needs. Software requirements specify what the software must do to meet the business needs. For example, a stores manager might state his requirements in terms of efficiency in stores management. A bank manager might state his requirements in terms of time to service his customers. It is the analyst's job to understand these requirements and provide an appropriate solution. To be able to do this, the analyst must understand the client's business domain: who are all the stake holders, how they affect the system, what are the constraints, what are the alterables, etc. The analyst should not blindly assume that only a software solution will solve a client's problem. He should have a broader vision. Sometimes, re-engineering of the business processes may be required to improve efficiency and that may be all that is required. After all this, if it is found that a software solution will add value, then a detailed statement of what the software must do to meet the client's needs should be prepared. This document is called Software Requirements Specification (SRS) document. Stating and understanding requirements is not an easy task. Let us look at a few examples: "The counter value is picked up from the last record" In the above statement, the word 'last' is ambiguous. It could mean the last accessed record, which could be anywhere in a random access file, or, it could be physically the last record in the file "Calculate the inverse of a square matrix 'M' of size 'n' such that LM=ML=In where 'L' is the inverse matrix and 'In' is the identity matrix of size 'n' " This statement though appears to be complete, is missing on the type of the matrix elements. Are they integers, real numbers, or complex numbers. Depending on the answer to this question, the algorithm will be different. "The software should be highly user friendly" How does one determine, whether this requirement is satisfied or not.
"The output of the program shall usually be given within 10 seconds" What are the exceptions to the 'usual 10 seconds' requirement?
The statement of requirements or SRS should possess the following properties: All requirements must be correct. There should be no factual errors All requirements should have one interpretation only. We have seen a few examples of ambiguous statements above. The SRS should be complete in all respects. It is difficult to achieve this objective. Many times clients change the requirements as the development progresses or new requirements are added. The Agile development methodologies are specifically designed to take this factor in to account. They partition the requirements in to subsets called scenarios and each scenario is implemented separately. However, each scenario should be complete. All requirements must be verifiable, that is, it should be possible to verify if a requirement is met or not. Words like 'highly', 'usually', should not be used. All requirements must be consistent and non-conflicting As we have stated earlier, requirements do change. So the format of the SRS should be such that the changes can be easily incorporated
2. Understanding Requirements 2.1 Functional and Non-Functional Requirements Requirements can be classified in to two types, namely, functional requirements and nonfunctional requirements. Functional requirements specify what the system should do. Examples are: Calculate the compound interest at the rate of 14% per annum on a fixed deposit for a period of three years Calculate tax at the rate of 30% on an annual income equal to and above Rs.2,00,000 but less than Rs.3,00,000 Invert a square matrix of real numbers (maximum size 100 X 100)
Non-functional requirements specify the overall quality attributes the system must satisfy. The following is a sample list of quality attributes: portability reliability performance testability modifiability security presentation reusability understandability acceptance criteria interoperability
Some examples of non-functional requirements are:
Number of significant digits to which accuracy should be maintained in all numerical calculations is 10 The response time of the system should always be less than 5 seconds The software should be developed using C language on a UNIX based system A book can be deleted from the Library Management System by the Database Administrator only The matrix diagonalisation routine should zero out all off-diagonal elements, which are equal to or less than 10-3 Experienced officers should be able to use all the system functions after a total training of two hours. After this training, the average number of errors made by experienced officers should not exceed two per day.
2.2 Other Classifications Requirements can also be classified in to the following categories: Satisfiability Criticality Stability User categories
Satisfiability: There are three types of satisfiability, namely, normal, expected, and exciting. Normal requirements are specific statements of user needs. The user satisfaction level is directly proportional to the extent to which these requirements are satisfied by the system. Expected requirements may not be stated by the users, but the developer is expected to meet them. If the requirements are met, the user satisfaction level may not increase, but if they are not met, users may be thoroughly dissatisfied. They are very important from the developer's point of view. Exciting requirements, not only, are not stated by the users, they do not even expect them. But if the developer provides for them in the system, user satisfaction level will be very high. The trend over the years has been that the exciting requirements often become normal requirements and some of the normal requirements become expected requirements. For example, as the story goes, on-line help feature was first introduced in the UNIX system in the form of man pages. At that time, it was an exciting feature. Later, other users started demanding it as part of their systems. Now a days, users do not ask for it, but the developer is expected to provide it.
Criticality: This is a form of priortising the requirements. They can be classified as mandatory, desirable, and non-essential. This classification should be done in consultation with the users and helps in determining the focus in an iterative development model. Stability: Requirements can also be categorised as stable and non-stable. Stable requirements don't change often, or atleast the time period of change will be very long. Some requirements may change often. For example, if business process reengineering is going on alongside the development, then the corresponding requirements may change till the process stabilises. User categories: As was stated in the introduction, there will be many stake holders in a system. Broadly they are of two kinds. Those who dictate the policies of the system and those who utilise the services of the system. All of them use the system. There can be further subdivisons among these classes depending on the information needs and services required. It is important that all stakeholders are identified and their requirements are captured. 3. Modelling Requirements 3.1 Overview Every software system has the following essential characteristics: It has a boundary. The boundary separates what is with in system scope and what is outside It takes inputs from external agents and generates outputs
It has processes which collaborate with each other to generate the outputs These processes operate on data by creating, modifying, destroying, and querying it The system may also use data stores to store data which has a life beyond the system
In the following, we will be describing artifacts used by Structured Systems Analysis and Design Methodology (SSADM). It uses: Data Flow Diagram (DFD) for modelling processes and their interactions. Entity Relationship Diagram (ERD) for modelling data and their relationships. Data Dictionary to specify data Decision Tables and Decision Trees to model complex decisions. Structured English to paraphrase process algorithms. State Transition Diagram to model state changes of the system.
3.2 Data Flow Diagram (DFD) Data flow diagram focuses on movment of data through the system and its transformations. It is divided in to levels. Level 0, also known as the context diagram, defines the system scope. It consists of external agents, system boundary, and the data flow between the external agents and the system. Level 1 is an explosion of Level 0, where all the major processes, data stores, and the data flow between them is shown. Level 2, Level 3, etc. show details of individual processes. The notation used is the following: External agents: They are external to the system, but interact with the system. They must be drawn at level 0, but need not be drawn at level 2 onwards. Duplicates are to be identified. They must be given meaningful names.
Process: They indicate information processing activity. They must be shown at all levels, At level 0, only a single process, depicting the system is shown. On subsequent levels, the number of processes should be limited to 7 2. No duplicates are allowed.
Data Stores: They are used to store information. They are not shown at level 0. All data stores should be shown at level 1. Duplicates must be indicated.
Data Flows: They indicate the flow of information. They must be shown at all levels and meaningful names must be given.
Examples: 1. Customer places sales orders. The system checks for availability of products and updates sales information
2. Company receives applications. Checks for eligibility conditions. Invites all eligible candidates for interview. Maintains a list of all candidates called for interview. Updates the eligibility conditions as and when desired by the management
Getting started: Identify the inputs or events which trigger the system and outputs or responses from the system Identify the corresponding sources and destinations (external agents) Produce a context diagram (Level 0). It should show the system boundary, external agents, and the dataflows connecting the system and the external agents. Produce Level 1 diagram. It must show all the external agents, all the major processes, all the data stores, and all the dataflows connecting the various artifacts. The artifacts should be placed based on logical precedence rather than temporal precedence. Avoid dataflow crossings. Refine the Level 1 diagram. Explode the individual processes as necessary.
Points to remember: 1) Remember to name every external agent, every process, every data store, and every dataflow. 2) Do not show how things begin and end. 3) Do not show loops, and decisions. 4) Do not show dataflows between external agents. They are outside the scope of the system.
5) Do not show dataflow between an external agent and a data store. There should be a process in between.
6) Do not show dataflow between two data stores. There should be a process in between.
7) There should not be any unconnected external agent, process, or data store. 8) Beware of read-only or write-only data stores
9) Beware of processes which take inputs without generating any outputs. Also, beware of processes which generate outputs spontaneously without taking any inputs.
9) Ensure that the data flowing in to a process exactly matches the data flowing in to the exploded view of that process. Similarly for the data flowing out of the process. 10) Ensure that the data flowing out of a data store matches data that has been stored in it before. See the appendix for the complete data flow diagram of "Material Procurement System (Case Study)" 3.3 Entity Relationship Diagram (ERD) ERD complements DFD. While DFD focuses on processes and data flow between them, ERD focuses on data and the relationships between them. It helps to organise data used by a system in a disciplined way. It helps to ensure completeness, adaptability and stability of data. It is an effective tool to communicate with senior management (what is the data needed to run the business), data administrators (how to manage and control data), database designers (how to organise data efficiently and remove redundancies). It consists of three components. Entity: It represents a collection of objects or things in the real world whose individual members or instances have the following characteristics:
Each can be identified uniquely in some fashion. Each plays a necessary role in the system we are building. Each can be described by one or more data elements (attributes).
Entities generally correspond to persons, objects, locations, events, etc. Examples are employee, vendor, supplier, materials, warehouse, delivery, etc. There are five types of entities. Fundamental entity: It does not depend on any other entity for its existence. For e.g. materials Subordinate entity: It depends on another entity for its existance. For example, in an inventory management system, purchase order can be an entity and it will depend on materials being procured. Similarly invoices will depend on purchase orders. Associative entity: It depends on two or more entities for its existence. For example, student grades will depend on the student and the course. Generalisation entity: It encapsulates common characteristics of many subordinate entities. For example, a four wheeler is a type of vehicle. A truck is a type of four wheeler . Aggregation entity: It consists of or an aggregation of other entities. For example, a car consists of engine, chasis, gear box, etc. A vehicle can also be regarded as an aggregation entity, because a vehicle can be regarded as an aggregation of many parts.
Attributes: They express the properties of the entities. Every entity will have many attributes, but only a subset, which are relevant for the system under study, will be chosen. For example, an employee entity will have professional attributes like name, designation, salary, etc. and also physical attributes like height, weight, etc. But only one set will be chosen depending on the context. Attributes are classified as entity keys and entity descriptors. Entity keys are used to uniquely identify instances of entities. Attributes having unique values are called candidate keys and one of them is designated as primary key. The domains of the attributes should be pre-defined. If 'name' is an attribute of an entity, then its domain is the set of strings of alphabets of predefined length.
Relationships: They describe the association between entities. They are characterised by optionality and cardinality. Optionality is of two types, namely, mandatory and optional. 1. Mandatory relationship means associated with every instance of the first entity there will be atleast one instance of the second entity. 2. Optional relationship means that there may be instances of the first entity, which are not associated with any instance of the second entity. For example,
employee-spouse relationship has to be optional because there could be unmarried employees. It is not correct to make the relationship mandatory. Cardinality is of three types: one-to-one, one-to-many, many-to-many. 1. One-to-one relationship means an instance of the first entity is associated with only one instance of the second entity. Similarly, each instance of the second entity is related to one instance of the first entity. 2. One-to-many relationship means that one instance of the first entity is related to many instances of the second entity, while an instance of the second entity is associated with only one instance of the first entity. 3. In many-to-many relationship an instance of the first entity is related to many instances of the second entity and the same is true in the reverse direction also. Other types of relationships are multiple relationships between entities, relationships leading to associative entities, relationship of entity with itself, EXCLUSIVE-OR and AND relationships
ERD notation: There are two type of notation used: 1. Peter Chen notation 2. Bachman notation. Not surprisingly, Peter Chen and Bachman are the name inventors of the notation. The following table gives the notation. COMPONENT ENTITY OR OBJECT TYPE REPRESENTATION PURCHASE ORDER
RELATIONSHIP
CARDINALITY
OPTIONALITY
PETER CHEN
BACHMAN
Example for Bachman notation
Example for Peter Chen notation
Given below are a few examples of ER diagrams using Bachman notation. First the textual statement is given followed by the diagram 1. In a company, each division is managed by only one manager and each manager manages only one division
2. Among the automobile manufacturing companies, a company manufactures many cars, but a given car is manufactured in only one company
3. In a college, every student takes many courses and every course is taken by many students
4. In a library, a member may borrow many books and there may be books which are not borrowed by any member
5. A teacher teaches many students and a student is taught by many teachers. A teacher conducts examination for many students and a student is examined by many teachers.
6. An extension of example-3 above is that student-grades depend upon both student and the course. Hence it is an associative entity
7. An employee can play the role of a manager. In that sense, an employee reports to another employee.
8. A tender is floated either for materials or services but not both.
9. A car consists of an engine and a chasis
3.4 Data Dictionary It contains an organised list of data elements, data structures, data flows, and data stores mini specifications of the primitive processes in the system any other details which will provide useful information on the system
Data element is piece of data, which can not be decomposed further in the current context of the system.
Examples are purchase_order_no., employee_name, interest_rate, etc. Each data element is a member of a domain. The dictionary entry of a data element should also specify the domain. Data structure is composed of data elements or other data structures. Examples are Customer_details, which may be composed of Customer_name and Customer_address. Cutomer_address in turn is a structure. Another example is Invoice, which may be composed of Invoice_identification, Customer_details, Delivery_address, Invoice_details. Data flow is composed of data structures and/or data elements. Definitions of dependent data structures/data elements precede the definition of data flow. While defining the data flow the connecting points should be mentioned.
Also useful to include the flow volume/frequency and growth rates. Data store, like data flow is made up of a combination of data structures and/or data elements. The description is similar to data flows. The notation elements used in the data dictionary are the following: [spouse_name] This indicates that spouse_name is optional {dependent_name, relationship} * (0 to 15) This indicates that the data strucure can be repeated 0 to 15 times {expense_description, company_name, charge} * (1 to N) This indicates that the data structure may be repeated 1 to N where N is not fixed voter_identity_number/customer_account_number This indicates that either of the elements will be present.
Data dictionary also contains mini specifications. They state the ways in which data flows that enter the primitive process are transformed in to data flows that leave the process. Only the broad outline is given, not the detailed steps. They must exist for every primitive process. Structured english is used for stating minispecifications.
Once the DFD, ERD, and the Data dictionary are created, the three of them must be matched against each other. DFD and ERD can be created independently and parallely. Every data store in the DFD must correspond to atleast one entity in the ERD. There should be processes in DFD which create modify and delete instances of the entities in ERD. For every relationship in ERD there should be a process in DFD which uses it. For every description in the data dictionary, there should be corresponding elements in DFD and ERD.
3.5 Decision Tree and Decision Tables A decision tree represents complex decisions in the form of a tree. Though visually it is appealing, it can soon get out of hand when the number and complexity of decisions increase. An example is given below. First the textual statment is given and then the corresponding decision tree is given: Rules for electricity billing are as below: If the meter reading) If the meter If the house calculate on If the meter usage reading is "OK", calculate on consumption basis(i.e. meter reading appears "LOW", then check if the house is occupied is occupied, calculate on seasonal consumption basis otherwise consuption basis is damaged, calculate based on maximum possible electricity
There are two types of decision tables, binary-valued(yes or no) and multi-valued. An example follows: ELECTRICITY BILL CALCULATION BASED ON CUSTOMER CLASS If a customer uses electricity for domestic purposes and if the consumption is less than 300 units per month then bill with minimum monthly charges. Domestic customers with a consumption of 300 units or more per month are billed at special rate. Non-domestic users are charged double that of domestic users (minimum and special rates are double).
BINARY-VALUED DECISION TABLE Domestic Customer Consumtion < 300 units per month Minimum rate Special rate Double minimum rate Double special rate Y Y Y N N N Y N N Y N N N Y N N Y N N N N N N Y
MULTI-VALUED DECISION TABLE Customer Consumption Rate D 300 S D <300 M N 300 2S N <300 2M
Like decision trees, binary-value decision tables can grow large if the number of rules increase. Multi-valued decision tables have an edge. In the above example, if we add a new class of cutomers, called Academic, with the rules: If the consumption is less than 300 units per month then bill with concessional rates. Otherwise bill with twice the concessional rates. then new tables will look like the following:
BINARY-VALUED DECISION TABLE (three rows and two columns are added to deal with the extra class of customers) Academic Domestic customer N Y N Y N N N N Y N Y N
Consumption < 300 units/month Minimum rate Special rate Twice minimum rate Twice special rate Concessional rate Twice concessional rate
Y Y N N N N N
N N Y N N N N
Y N N Y N N N
N N N N Y N N
Y N N N N Y N
N N N N N N Y
MULTI-VALUED DECISION TABLE (only two columns are added to deal with the extra class of customers) Customer Consumption Rate Domestic 300 Special Domestic <300 Minimum Non-domestic 300 Twice special Non-domestic <300 Twice minimum Academic 300 Twice concessional Academic <300 Concessional
3.6 Structured English To specify the processes (minispecifications) structured english is used. It consists of: sequences of instructions (action statements) decisions (if-else) loops (repeat-until) case groups of instructions
Examples: Loose, normal english: In the case of 'Bill', a master file is updated with the bill (that is consumer account number and bill date). A control file is also to be updated for the 'total bill amount'. A similar treatment is to be given to 'Payment' Structured english: If transaction is 'BILL' then update bill in the Accounts master file update total bill amount in the Control file If transaction is 'PAYMENT' then update receipt in the Accounts master file update total receipt amount in the Control file
Another example: If previous reading and new reading match then perform 'status-check' If status is 'dead' then calculate bills based on average consumption Else compute bill based on actual consumption
status-check If meter does not register any change after switching on any electrical device then meter status is 'dead' Else meter status is 'ok' 3.7 State Transition Diagram Another useful diagram is the state transition diagram. It can be used to model the state changes of the system. A system is in a state and will remain in that state till a condition and an action force it to change state. See the following figure. Appendix contains another example.
4. Conclusion The output of the requirements engineering phase is the software requirements specifications (SRS) document. At a minimum, it should contain the DFD, ERD, and the
data dictionary and the minispecifications. The other diagrams may be used as required. The satndards body of Institute for Electrical Electronics Engineers (IEEE) has defined a set of recommended practices called "IEEE Recommended Practice for Software Requirements Specifications", IEEE standards 830-1998. It can be used as a guideline document for SRS. 5. Appendix Case Study: Material Procurement System XYZ is a company which manufactures fertilizers(Soil Nutrition Chemicals) and chemicals for agricultural use. The company has its head office in Delhi and a manufacturing plant in Surat. The manufacturing plant is looked after by a Director-Operations. The Director-Operations is supported by various deparmental heads for executing day-to-day operations, plant production, plant maintenance, finance, purchase, stores, personnel fertilizers transportation and plant administration are some of the major departments. Our study is restricted to the purchase operations and related interfaces. Currently, all purchase operations are being carried out manually. Management has decided to support purchase operations with a computer. An extract of the discussions with the purchase manager and his staff related to current procedures is enclosed. An extract of discussions held with the purchase manager and his staff. A. Overview of the Department The Purchase Department is headed by a Purchase Manager and he reports directly to Director-Operations. The Purchase Manager is supported by Material Procurement Officers (Buyers) and clerical staff, total strength of the department is 20. The Purchase Department's main function is to procure material for production, maintenance and other departments in time so that there should not be any situation where the plant has to be shut down due to requirement of some material requested by the departments. At present, there are more than 300 Purchase Order (POs) always open at any time (a closed Purchase Order is defined as a Purchase Order for which all material order is received and paid) and it has become impossible to keep track of activities and provide up-to-date status of these activities to the concerned departments. The idea behind seeking computer support is to access information quickly and monitor purchase activity efficiently. It is also expected that after computerisation, with the same manpower, future purchase load (related to diversification plans of the company) can also be met. The various procedures followed by the Purchase Department have already been studied and improved by a company approved management consultant and unless really necessary no procedures should be changed while providing the necessary computer support.
B. Procedural Details Related to Purchase Activity User Departments (i.e. Production, Maintenance, etc.) prepare their own material purchase request (MPR) and send these to purchase department as and when a requirement comes up. More than one material procurement detail can be given in the same MPR. Material Procurement details contain material code, description and quantity. Purchase Department validates the MPR for material requested and MPRs received. MPRs are accepted by Purchase Department only if the required quantity is not available in stores (Stock Clearance by Stores Dept.) and concerned department has not exhausted allocated budgets for procurement. Stock Clearance from stores dept. is required only for non-stock items of specific class. (Allocated Budgets are provided by Finance Department at the beginning of the financial year and are available with purchase Department). Whenever a buyer who handles a particular material group (25 material group have been formed on the basis of similariy of material) gets time, he reviews MPR Register and consolidates (from various MPRs) material requirement for his groups. After this exercise, he raises an enquiry (request for quotation) with registered vendors (vendors are registered with the company for specific class of items). An enquiry can contain one of many materials of a material group (for example 1.5", 1" bolts etc. which have separate material codes, can be part of single enquiry) every enquiry has only one closing date (i.e. last date of receipt of quotation from vendors). Vendors submit technical and commercial quotation (2 stage bidding). If any technical discrepancy is found such quotations are rejected after closing date for an enquiry. All commercial quotations received are opened by the concerned buyer (in presence of Purchase Manager and Vendors) and afterwards are analysed for total value (basic cost of materials, taxes, Insurance packing freight). On lowest cost basis an offer is selected and purchase order proposal is prepared. A PO proposal contains details of all materials, payment terms and all other relevant terms and conditions. This PO proposal is sent to Finance Dept. and a financial concurrence is obtained. Sometimes in the organisation's interest, for an enquiry more than one Purchase Orders are placed (For example, out of 4 materials involved in an enquiry, 2 are cheaper from one vendor and rest are cheaper from another vendor). In a purchase Order every material ordered has a due date for delivery. Also in some purchase orders, where quantity ordered is large (this situation is typical for bulk material procurement, cement, steel etc.) staggered delivery of material is mentioned. PO is considered effective from the date of acceptance of PO by vendors. Vendors deliver material (against PO) to Store Department and submit an invoice to Purchase Department. On receiving of "Acceptance" intimation from stores, Purchase Departments checks and passes the invoice and payment advice to Finance Department for payment. In case of any discrepancy in the terms & conditions such invoice is rejected. While passing invoices, adjustments are made of any material which is rejected by stores (such data is mentioned by Stores in Material Receipt Intimation). Environmental Model (Ref. Material Procurement System) Event List 1. Receipt of Purchase Request from Departments 2. Receipt of Stock Clearance Data 3. Receipt of Quotations from Vendors
4. 5. 6. 7. 8.
Order Acceptance from Vendors Availability of Material Receipt Data Invoice Submission by Vendors Receipt of Financial Concurrence Receipt of Allocated Budgets
Level -1 Data Flow Diagram
Level 2 - Data Flow Diagram
6. References 1. Software Requirements - Analysis and Specification - Alan M. Davis, Printice Hall. 2. Systems Analysis & Design Methods - J. L. Whitten, L. D. Bentley and V. M. Barlow Richard d. Iriwin Inc - Galgotia Publications Pvt. Ltd. 3. Software Engineering - Ian Sommerville Addison - Wesley Publishers Ltd. 4. Introducing Systems Analysis - Steven Skidmore - BPB Publications.
Reviews, Walkthroughs & Inspections

1. Formal Definitions Quality Control (QC) A set of techniques designed to verify and validate the quality of work products and observe whether requirements are met. Software Element Every deliverable or in-process document produced or acquired during the Software Development Life Cycle (SDLC) is a software element. Verification and validation techniques Verification - Is the task done correctly? Validation - Is the correct task done? Static Testing V&V is done on any software element. Dynamic Testing V&V is done on executing the software with pre-defined test cases. 2. Importance of Static Testing Why Static Testing? The benefit is clear once you think about it. If you can find a problem in the requirements before it turns into a problem in the system, that will save time and money. The following statistics would be mind boggling.
M.E. Fagan "Design and Code Inspections to Reduce Errors in Program Development", IBM Systems Journal, March 1976. Systems Product 67% of total defects during the development found in Inspection Applications Product 82% of all the defects found during inspection of design and code
A.F. Ackerman, L. Buchwald, and F. Lewski, "Software Inspections: An Effective Verification Process," IEEE Software, May 1989. Operating System Inspection decreased the cost of detecting a fault by 85%
Marilyn Bush, "Improving Software Quality: The Use of Formal Inspections at the Jet Propulsion Laboratory", Proceedings of the 12th International Conference on Software Engineering, pages 196-199, IEEE Computer Society Press, Nice, France, March 1990. Jet Propulsion Laboratory Project Every two-hour inspection session results, on an average, in saving of $25,000
The following three stories should communicate the importance of Static Testing: When my daughter called me a cheat My failure as a programmer Loss of Mars Climate Obiter A Few More Software Failures - Lessons for others The following diagram of Fagan (Advances in Inspections, IEEE Transactions on Software Engineering) captures the importance of Static Testing. The lesson learned could be summarized in one sentence - Spend a little extra earlier or spend much more later.
The statistics, the above stories and Fagans diagram emphasizes the need for Static Testing. It is appropriate to state that not all static testing involves people sitting at a table looking at a document. Sometimes automated tools can help. For C programmers, the lint program can help find potential bugs in programs. Java programmers can use tools like the JTest product to check their programs against a coding standard. When to start the Static Testing? To get value from static testing, we have to start at the right time. For example, reviewing the requirements after the programmers have finished coding the entire system may help testers design test cases. However, the significant return on the static testing investment is no longer available, as testers can't prevent bugs in code that's already written. For optimal returns, a static testing should happen as soon as possible after the item to be tested has been created, while the assumptions and inspirations remain fresh in the creator's mind and none of the errors in the item have caused negative consequences in downstream processes. Effective reviews involve the right people. Business domain experts must attend requirements reviews, system architects must attend design reviews, and expert programmers must attend code reviews. As testers, we can also be valuable participants, because we're good at spotting inconsistencies, vagueness, missing details, and the like. However, testers who attend review meetings do need to bring sufficient knowledge of the business domain, system architecture, and programming to each review. And everyone who attends a review, walkthrough or inspection should understand the basic ground rules of such events. The following diagram of Somerville (Software Engineering 6th Edition) communicates, where Static Testing starts.
3. Reviews IEEE classifies Static Testing under three broad categories: Reviews Walkthroughs Inspections
What is Reviews? A meeting at which the software element is presented to project personnel, managers, users, customers or other interested parties for comment or approval. The software element can be Project Plans, URS, SRS, Design Documents, code, Test Plans, User Manual. What are objectives of Reviews? To ensure that: The software element conforms to its specifications. The development of the software element is being done as per plans, standards, and guidelines applicable for the project. Changes to the software element are properly implemented and affect only those system areas identified by the change specification.
Reviews - Input A statement of objectives for the technical reviews The software element being examined Software project management plan Current anomalies or issues list for the software product Documented review procedures Earlier review report - when applicable Review team members should receive the review materials in advance and they come prepared for the meeting Check list for defects
Reviews Meeting
Examine the software element for adherence to specifications and standards Changes to software element are properly implemented and affect only the specified areas Record all deviations Assign responsibility for getting the issues resolved Review sessions are not expected to find solutions to the deviations. The areas of major concerns, status on previous feedback and review days utilized are also recorded. The review leader shall verify, later, that the action items assigned in the meeting are closed
Reviews - Outputs List of review findings List of resolved and unresolved issues found during the later re-verification
4. Walkthrough Walkthrough Definition A technique in which a designer or programmer leads the members of the development team and other interested parties through the segment of the documentation or code and the participants ask questions and make comments about possible errors, violation of standards and other problems. Walkthrough - Objectives To find defects To consider alternative implementations To ensure compliance to standards & specifications
Walkthrough Input A statement of objectives The software element for examination Standards for the development of the software Distribution of materials to the team members, before the meeting Team members shall examine and come prepared for the meeting Check list for defects
Walkthrough- Meeting Author presents the software element Members ask questions and raise issues regarding deviations Discuss concerns, perceived omissions or deviations from the specifications Document the above discussions Record the comments and decisions The walk-through leader shall verify, later, that the action items assigned in the meeting are closed
Walkthrough Outputs
List of walk-through findings List of resolved and unresolved issues found during the later re-verification
5. Inspection Inspection Definition A visual examination of software element to detect errors, violations of development standards and other problems. An inspection is very rigorous and the inspectors are trained in inspection techniques. Determination of remedial or investigative action for a defect is a mandatory element of a software inspection, although the solution should not be determined in the inspection meeting. Inspection Objectives To verify that the software element satisfies the specifications & conforms to applicable standards To identify deviations To collect software engineering data like defect and effort To improve the checklists, as a spin-off
Inspection Input A statement of objectives for the inspection Having the software element ready Documented inspection procedure Current defect list A checklist of possible defects Arrange for all standards and guidelines Distribution of materials to the team members, before the meeting Team members shall examine and come prepared for the inspection
Inspection Meeting Introducing the participants and describing their role (by the Moderator) Presentation of the software element by non - author Inspectors raise questions to expose the defects Recording defects - a defect list details the location, description and severity of the defect Reviewing the defect list - specific questions to ensure completeness and accuracy Making exit decision - Acceptance with no or minor rework, without further verification - Accept with rework verification (by inspection team leader or a designated member of the inspection team) - Re-inspect Inspection - Output Defect list, containing a defect location, description and classification An estimate of rework effort and rework completion date
6. Comparison of Reviews, Walk-Throughs and Inspections Objectives: Reviews - Evaluate conformance to specifications; Ensure change integrity Walkthroughs - Detect defects; Examine alternatives; Forum for learning Inspections - Detect and identify defects; Verify resolution
Group Dynamics: Reviews: 3 or more persons ; Technical experts and peer mix Walkthroughs: 2 to 7 persons Technical experts and peer mix Inspections: 3 to 6 persons; Documented attendance; with formally trained inspectors
Decision Making & Change Control: Reviews: Review Team requests Project Team leadership or management to act on recommendations Walkthroughs: All decisions made by producer; Change is prerogative of the author Inspections: Team declares exit decision Acceptance, Rework & Verify or Rework & Re-inspect
Material Volume: Reviews: Moderate to high Walkthroughs: Relatively low Inspections: Relatively low
Presenter Reviews: Software Element Representative Walkthroughs: Author Inspections: Other than author
7. Advantages Advantages of Static Methods over Dynamic Methods Early detection of software defects Static methods expose defects, whereas dynamic methods show only the symptom of the defect Static methods expose a batch of defects, whereas it is usually one by one in dynamic methods Some defects can be found only by Static Testing o Code redundancy (when logic is not affected) o Dead code o Violations of coding standards
8. Metrics for Inspections
Fault Density Specification and Design Faults per page Code Faults per 1000 lines of code
Fault Detection Rate Faults detected per hour Fault Detection Efficiency Faults detected per person - hour Inspection Efficiency (Number of faults found during inspection) / (Total number of faults during development) Maintenance Vs Inspection "Number of corrections during the first six months of operational phase" and "Number of defects found in inspections" for different projects of comparable size 9. Common Issues for Reviews, Walk-throughs and Inspections Responsibilities for Team Members Leader: To conduct / moderate the review / walkthrough / inspection process Reader: To present the relevant material in a logical fashion Recorder: To document defects and deviations Other Members: To critically question the material being presented
Communication Factors Discussion is Kept within bounds Not extreme in opinion Reasonable and calm Not directed at a personal level Concentrate on finding defects Not get bogged down with disagreement Not discuss trivial issues Not involve itself in solution-hunting The participants should be sensitive to each other by keeping the synergy of the meeting very high
Being aware of, and correcting any conditions, either physical or emotional, that are draining off the participants attention, shall ensure that the meeting is fruitful i.e. Maximum number of defects is found during the early stages of software development life cycle.
10. Basic References 1. IEEE Standards for Software Reviews, Std 1028 1997 2. Handbook of Walkthroughs, Inspections, and Technical Reviews (Third Edition), Daniel P. Freedman and Gerald M. Weinberg, Dorset House,1990 Literature References for further reading: 1. M. E. Fagan, "Advances In Software Inspections," IEEE Transactions on Software Engineering, vol. 12, no 7, pp. 744 751, July 1986. 2. E. P. Doolan, "Experience with Fagan's Inspection Method," Software-Practice and Experience, vol. 22, no 2, pp. 173 182, February 1992. 3. G. W. Russell, "Experience with inspection in ultralarge-scale developments", IEEE Software, vol. 8, no 1, pp. 25 31, January 1991. 4. M.E. Fagan "Design and Code Inspections to Reduce Errors in Program Development", IBM Systems Journal, vol. 15, no 3, pp. 182 211, March 1976. 5. A.F. Ackerman, L. Buchwald, and F. Lewski, "Software Inspections: An Effective Verification Process," IEEE Software, vol. 6, no 3, pp. 31 36, May 1989. 6. Priscilla J. Fowler, "In-process inspections of workproducts at AT&T", AT&T Technical Journal, vol. 65, no 2, pp. 102 112, March 1986. 7. Marilyn Bush, "Improving Software Quality: The Use of Formal Inspections at the Jet Propulsion Laboratory", Proceedings of the 12th International Conference on Software Engineering, Nice, France, March 1990, IEEE Computer Society Press pp. 196-199.
Software Development Process

1. Introduction Computers are becoming a key element in our daily lives. Slowly and surely they are taking over many of the functions that effect our lives critically. They are now controlling all forms of monetary transactions, manufacturing, transportation, communication, defence systems, process control systems, and so on. In the near future, they will be found in our homes, controlling all forms of appliances. Left to themselves, they are harmless pieces of hardware. Load the right kind of software, they can take you to the moon, both literally and figuratively. It is the software that gives life to them.When they are going to play such a crucial role, one small flaw either in the hardware or the software can lead to catastrophic consequences. The sad part is, while there are well defined processes based on theoretical foundations to ensure the reliability of the hardware, same thing can not be said about software. There is no theory for software devlopment as yet. But at the same time, it is mandatory that software always behaves in a predictable manner, even in unforeseen circumstances. Hence there is a need to control its development through a well defined and systematic process. The old fashioned 'code & test' approach will not do any more. It may be good enough for 'toy' problems, but in real life, software is expected to solve enormously complex problems. Some of the aspects of real life software projects are: Team effort: Any large development effort requires the services of a team of specialists. For example the team could consist of domain experts, software design experts, coding specialists, testing experts, hardware specialists, etc. Each group could concentrate on a specific aspect of the problem and design suitable solution. However no group can work in isolation. There will be constant interaction among team members. Methodology: Broadly there are two types of methodologies, namely, 'procedure oriented methodolgies' and 'object oriented methodologies'. Though theoretically either of them could be used in any given problem situation, one of them should be chosen in advance. Documentation: Clear and unambiguous documentation of the artifacts of the development process are critical for the success of the software project. Oral communication and 'back of the envelop designs' are not sufficient. For example, documentation is necessary if client signoff is required at various stages of the process. Once developed, the software lives for a long time. During its life, it has to undergo lot of changes. Without clear design specifications and well documented code, it will be impossible to make changes. Planning: Since the development takes place against a client's requirements it is imperative that the whole effort is well planned to meet the schedule and cost constraints. Quality assurance: Clients expect value for money. In addition to meeting the client's requirements, the software also should meet additional quality constraints. They could be in terms of performance, security, etc. Lay user: Most of the time, these software packages will be used by non-computer savvy users. Hence the software has to be highly robust. Software tools: Documentation is important for the success of a software project, but it is a cumbersome task and many software practitioners balk at the prospect of documentation. There are tools known as Computer Aided Software Engineering (CASE) tools which simplify the process of documentation. Conformance to standards: We need to follow certain standards to ensure clear and unambiguous documentation. For example, IEEE standards for requirements
specifications, design, etc. Sometimes, clients may specify the standards to be used. Reuse: The development effort can be optimised, by reusing well-tested components. For example, mathematical libraries, graphical user interface tool kits, EJBs, etc. Non-developer maintenance: Software lives for a long time. The development team, may not be available to maintain the package. Some other team will have to ensure that the software continues to provide services. Change management: Whenever a change has to be made, it is necessary to analyse its impact on various parts of the software. Imagine modifying the value of global variable. Every function that accesses the variable will be effected. Unless care is taken to minimise the impact, the software may not behave as expected. Version control: Once changes are made to the software, it is important that the user gets the right copy of the software. In case of failures, it should be possible to roll back to the previous versions. Subject to risks: Any large effort is subject to risks. The risks could be in terms of non-availability of skills, technology, inadequate resources, etc. It is necessary to constantly evaluate the risks, and put in place risk mitigation measures.
2. Software Quality The goal of any software development process is to produce high quality software. What is software quality? It has been variously defined as: Fitness for purpose Zero defects Conformability & dependability The ability of the software to meet customer's stated and implied needs
Some of the important attributes that can be used to measure software quality are: Correctness: Software should meet the customer's needs Robustness: Software must always behave in an expected manner, even when unexpected inputs are given Usability: Ease of use. A software with a graphical user interface is considered more user-friendly than one without it Portability: The ease with which software can be moved from one platform to another Efficiency: Optimal resource (memory & execution time) utilization Maintainability: Ease with which software can be modified Reliability: The probability of the software giving consistent results over a period of time Flexibility: Ease with which software can be adapted for use in different contexts Security: Prevention of unauthorised access Interoperabilty: The abilty of the software to integrate with existing systems Performance: The ability of the software to deliver the outputs with in the given constraints like time, accuracy, memory usage
Correctness is the most important attribute. Every software must be correct. The other attributes may be present in varying degrees. For example, it is an expensive proposition to make a software 100% reliable and it is not required in all contexts. If the software is going to be used in life critical situations, then 100% reliability is mandatory. But, say, in a weather monitoring system, a little less reliability may be acceptable. However, the final
decision lies with the client. One should keep in mind that some of the above attributes conflict with each other. For example, portability and efficiency could conflict with each other. To improve efficiency, one may resort to using system dependent features. but that will effect the portability. In the days, when DOS ruled the world, it was possible to access the internal architecture directly to improve performance. To port such a program to any other platform would require enormous changes. So in practice there will always be a tradeoff.
3. What is a Process 3.1 Exercise - 100% Inspection
3.2 What is a Process? A Process is a series of definable, repeatable, and measurable tasks leading to a useful result. The benefits of a well defined process are numerous. It provides visibility into a project. Visibility in turn aids timely mid-course corrections It helps developers to weed out faults at the point of introduction. This avoids cascading of faults into later phases It helps to organize workflow and outputs to maximize resource utilization It defines everybody's roles and responsibilities clearly. Individual productivity increases due to specialization and at the same time the team's productivity increases due to coordination of activities
A good software development process should: view software development as a value added business activity and not merely as a technical activity ensure that every product is checked to see if value addition has indeed taken place safeguard against loss of value once the product is complete provide management information for in-situ control of the process
To define such a process the following steps need to be followed:
Identify the phases of development and the tasks to be carried out in each phase Model the intra and inter phase transitions Use techniques to carry out the tasks Verify and Validate each task and the results Exercise process and project management skills
The words 'verify' and 'validate' need some clarification. Verify means to check if the task has been executed correctly, while validate means to check if the correct task has been executed. In the context of software, the process of checking if an algorithm has been implemented correctly, is verification, while the process of checking if the result of the algorithm execution is the solution to the desired problem, is validation. The generic phases that are normally used in a software development process are: Analysis: In this phase user needs are gathered and converted into software requirements. For example, if the user need is to generate the trajectory of a missile, the software requirement is to solve the governing equations. This phase should answer the question: what is to be done to meet user needs? Design: This phase answers the question: How to meet the user needs? With respect to the above example, design consists of deciding the algorithm to be used to solve the governing equations. The choice of the algorithm depends on design objectives like execution time, accuracy, etc. In this phase we determine organisation of various modules in the software system Construction: Coding is the main activity in this phase Testing: There are three categories of testing: unit testing, integration testing, and system testing. There are two types of testing: Black box testing and White box testing. Black box testing focuses on generating test cases based on requirements. White box testing focuses on generating test cases based on the internal logic of various modules Implementation
4. Software Life Cylce Models In practice, two types of software life cycle models are used - sequential model and iterative model. 4.1 Waterfall Model: Sequential model, also known as water fall model, is pictorially shown thus:
It represents the development process as a sequence of steps (phases). It requires that a phase is complete before the next phase is started. Because of the explicit recognition of phases and sequencing, it helps in contract finalisation with reference to delivery and payment schedules. In practice it is difficult to use this model as it is, because of the uncertainity in the software requirements. It is often difficult to envisage all the requirements a priori. If a mistake in understanding the requirements gets detected during the coding phase, then the whole process has to be started all over again. A working version of the software will not be available until late in the project life cycle. So, iteration both within a phase and across phases is a necessity. 4.2 Prototyping Prototyping is discussed in the literature as a separate approach to software development. Prototyping as the name suggests, requires that a working version of the software is built early in the project life. There are two types of prototyping models, namely: Throw away prototype and Evolutionary prototype
The objective of the throw away prototyping model is to understand the requirements and solution methodologies better. The essence is speed. Hence, an ad-hoc and quick development approach with no thought to quality, is resorted to. It is akin to 'code and test'. However, once the objective is met, the code is discarded and fresh development is started, ensuring that quality standards are met. Since the requirements are now well understood, one could use the sequential approach. This model suffers from wastage of effort, in the sense that developed code is discarded, because it does not meet the quality standards. Evolutionary prototyping takes a different approach. The requirements are prioritised and the code is developed for the most important requirements first, always with an eye on quality. Software is continuously refined and expanded with feedback from the client. The chief advantage of prototyping is that the client gets a feel of the product early in the project life cycle.
As can be seen, evolutionary prototyping is an iterative model. Such a model can be characterised by doing a little analysis, design, code, test and repeat the process till the product is complete. 4.3 Spiral Model Barry Boehm has suggested another iteartive model called the spiral model. It is more in the nature of a framework, which needs to be adapted to specific projects. Pictorially it can be shown thus:
It allows best mix of other approaches and focusses on eliminating errors and unattractive alternatives early. An important feature of this model is the stress on risk analysis. Once the objectives, alternatives, and constraints for a phase are identified, the risks involved in carrying out the phase are evaluated, which is expected to result in a 'go, no go' decision. For evaluation purposes, one could use prototyping, simulations, etc. This model is best suited for projects, which involve new technology development. Risk analysis expertise is most critical for such projects.
4.4 ETVX model IBM introduced the ETVX model during the 80's to document their processes. 'E' stands for the entry criteria which must be satisfied before a set of tasks can be performed, 'T' is the set of tasks to be performed, 'V' stands for the verification & validation process to ensure that the right tasks are performed, and 'X' stands for the exit criteria or the outputs of the tasks. If an activity fails in the validation check, either corrective action is taken or a rework is ordered. It can be used in any development process. Each phase in the process can be considered as an activity and structured using the ETVX model. If required, the tasks can be further subdivided and each subtask can be further structured using the ETVX model.
4.5 Rational Unified Process Model Among the modern process models, Rational Unified Process (RUP) developed by Rational Corporation is noteworthy. It is an iterative model and captures many of the best practices of modern software development. RUP is explained more fully in the module OOAD with UML. More information on RUP can be obtained here and from here. 4.6 Agile Methodologies All the methodologies described before are based on the premise that any software development process should be predictable and repeatable. One of the criticisms against these methodologies is that there is more emphasis on following procedures and preparing documentation. They are considered to be heavyweight or rigorous. They are also criticised for their excessive emphasis on structure. There is a movement called Agile Software Movement, questioning this premise. The proponents argue that software development being essentially a human activity, there will always have variations in processes and inputs and the model should be flexible enough to handle the variations. For example: the entire set of software requirements cannot be known at the begining of the project nor do they remain static. If the model cannot handle this dynamism, then there can be lot of wastage of effort or the final product may not meet the customer's needs. Hence the agile methodolgies advocate the principle "build short, build often". That is the given project is broken up in to subprojects and each subproject is developed and integrated in to the already delivered system. This way the customer gets continuous delivery of useful and usable systems. The subprojects are chosen so that they have short delivery cycles, usually of the order of 3 to 4 weeks. The development team also gets continuous feedback. A number of agile methodologies have been proposed. The more popular among them are SCRUM, DYNAMIC SYSTEMS DEVELOPMENT METHOD (DSDM), CRYSTAL METHODS, FEATURE-DRIVEN DEVELOPMENT, LEAN DEVELOPMENT (LD), EXTREME PROGRAMMING (XP). A short description of each of these methods follows:
SCRUM: It is a project management framework. It divides the development in to short cycles called sprint cycles in which a specified set of features are delivered. It advocates daily team meetings for coordination and integration. More information on SCRUM can be obtained here DYNAMIC SYSTEMS DEVELOPMENT METHOD (DSDM): It is characterised by nine principles: 1. Active user involvement 2. Team empowerment 3. Frequent delivery of products 4. Fitness for business purpose 5. Iterative and incremental development 6. All changes during development are reversible 7. Baselining of requirements at a high level 8. Integrated testing 9. Collaboration and cooperation between stakeholders More information on DSDM can be obtained here.
CRYSTAL METHODOLOGIES: They are a set of configurable methodologies. They focus on the people aspects of development. The configuration is carried out based on project size, criticality and objectives. Some of the names used for the methodologies are Clear, Yellow, Orange, Orange web, , Red , etc. More information can be obtained from here. FEATURE DRIVEN DEVELOPMENT (FDD): It is a short iteration framework for software development. It focuses on building an object model, build feature list, plan by feature, design by feature, and build by feature. More informtion can be obtained from here. LEAN DEVELOPMENT (LD): This methodology is derived from the principles of lean production, the restructuring of the Japanese automobile manufacturing industry that occurred in the 1980s. It is based on the following principles of lean thinking: Eliminate waste, Amplify learning, Decide as late as possible, Deliver as fast as possible, Empower the team, Build the integrity in, See the whole. More informtion can be obtained from here. EXTREME PROGRAMMING (XP): This methodology is probably the most popular among the agile methodologies. It is based on three important principles, viz., test first, continuous refactoring, and pair programming. More information can be obtained from here.
One of the important concepts popularised by XP is pair programming. Code is always developed in pairs. While one person is keying in the code, the other person would be reviewing. This site is dedicated to pair programming. The paper by Laurie Williams et al., demonstrates the efficacy of pair programming. The site agilealliance.com is dedicated to promoting agile software development methodologies. 5. How to choose a process Among the plethora of available processes, how can we choose one? There is no single answer to this question. Probably the best way to attack this problem is to look at the software requirements.
If they are stable and well understood, then waterfall model may be sufficient. If they are stable, but not clear, then throw away prototyping can be used. Where the requirements are changing, evolutionary prototyping is better. If the requirements are coupled with the underlying business processes, which are going through a process of change, then a model based on Boehm's spiral model, like the Rational Unified Process should be used. In these days of dynamic business environment, where 'time to market' is critical, and project size is relatively small, an agile process should be chosen.
These are but guidelines only. Many organisations choose a model and adapt it to their business requirements. For example, some organisations use waterfall model, modified to include iterations within phases.
6. Conclusions The most important take away from this module is that software development should follow a discplined process. The choice of the process should depend upon the stabilty of the requirements, completeness of requirements, underlying business processes, organisational structure, and the prevailing business environment. 8. References For Further Reading In addition to the links provided, the following references may be consulted: 1. " An Integrated Approach to Software Engineering", by Pankaj Jalote, SpringerVerlag 2. " Software Engineering: A Practitioner's Approach", by Roger S. Pressman, McGraw-Hill, Inc. 3. " Software Engineering", by Ian Sommerville, Addison-Wesley Publishing Company 4. " The Rational Unified Process, An Introduction", by Philippe Kruchten, AddisonWesley Publishing Company 5. " Agile Software Develoment Ecosystems", by Jim Highsmith, Addison-Wesley Publishing Company
Software Maintenance
1. What is Software Maintenance 1.1 Introduction Software maintenance is often considered to be (if it is considered at all) an unpleasant, time consuming, expensive and unrewarding occupation - something that is carried out at the end of development only when absolutely necessary (and hopefully not very often). As such it is often considered to be the poor relation of software development budgets often do not allow for it (or allow too little), and few programmers given a choice would choose to carry out maintenance over developmental work. This view, that software maintenance is the last resort is largely born out of ignorance. Misconceptions, misunderstandings and myths concerning this crucial area of software engineering abound. Software Maintenance suffers from an image problem due to the fact that although software has been maintained for years, relatively little is written about the topic. Little funding for research about software maintenance exists, thus, the academic community publishes relatively few papers on the subject. Maintenance organisations within business publish even less because of the corporate fear of giving away the "competitive edge". Although there are some textbooks on software maintenance, they are relatively few and far between (examples are included in the bibliography). Periodicals address the topic infrequently and few universities include software maintenance explicitly in their degree programmes . This lack of published information contributes to the misunderstandings and misconceptions that people have about software maintenance. Part of the confusion about software maintenance relates to its definition; when it begins, when it ends and how it relates to development. Therefore it is necessary to first consider what is meant by the term software maintenance. What is Software Maintenance? In order to define software maintenance we need to define exactly what is meant by software. It is a common misconception to believe that software is programs and that maintenance activities are carried out exclusively on programs. This is because many software maintainers are more familiar with, or rather are more exposed to programs than other components of a software system usually it is the program code that attracts most attention. 1.2 A Definition of Software A more comprehensive view of software is given by McDermid (1991) who states that it consists of the programs, documentation and operating procedures by which computers can be made useful to man. Table 1.1 below depicts the components of a software system according to this view and includes some examples of each. McDermid's definition suggests that software is not only concerned with programs - source and object code - but also relates to documentation of any facet of the program, such as requirements analysis, specification, design, system and user manuals, and the
procedures used to set up and operate the software system. McDermid's is not the only definition of a software system but it is comprehensive and widely accepted. Software Components Program Source code Object code Documentation Analysis/specification: (a) Formal specification (b) Context diagram (c) Data flow diagrams Design: (a) Flowcharts (b) Entity-relationship charts Implementation: (a) Source code listings (b) Cross-reference listing Testing: (a) Test data (b) Test results Operating Procedures 1. Instructions to set up and use the software system 2. Instructions on how to react to system failures Examples
1.3 A Definition of Maintenance The use of the word maintenance to describe activities undertaken on software systems after delivery has been considered a misnomer due to its failure to capture the evolutionary tendency of software products. Maintenance has traditionally meant the upkeep of an artifact in response to the gradual deterioration of parts due to extended use, which is simply corrective maintenance. So for example, one carries out maintenance on a car or a house usually to correct problems e.g. replacing the brakes or fixing the leaking roof. If however we were to build an extension to the house or fit a sun roof to a car then those would usually be thought of as improvements (rather than a maintenance activities). Therefore to apply the traditional definition of maintenance in the context of software means that software maintenance is only concerned with correcting errors. However, correcting errors accounts for only part of the maintenance effort. Consequently, a number of authors have advanced alternative terms that are considered to be more inclusive and encompass most, if not all, of the activities undertaken on existing software to keep it operational and acceptable to the users. These include software evolution, post-delivery evolution and support. However it can be argued that there is nothing wrong with using the word maintenance provided software engineers are educated to accept its meaning within the software engineering context regardless of what it means in non-software engineering disciplines. After all, any work that needs to be done to keep a software system at a level considered useful to its users will still have to be carried out regardless of the name it is given.
1.4 Definitions of Software Maintenance As a result of the above, attempts have been made to develop a more comprehensive definition of maintenance which would be appropriate for use within the context of software systems. Some definitions focus on the particular activities carried out during maintenance, for example, According to Martin and McClure (1983), software maintenance must be performed in order to Correct errors Correct design flaws Interface with other systems Make enhancements Make necessary changes to the system Make changes in files or databases Improve the design Convert programs so that different hardware, software, system features, and telecommunications facilities can be used
Others insist on a general view which considers software maintenance as any work that is undertaken after delivery of a software system, for example: ". . . changes that have to be made to computer programs after they have been delivered to the customer or user." (Martin and McClure 1983). "Maintenance covers the life of a software system from the time it is installed until it is phased out."(von Mayrhauser 1990).
The problem with these definitions is that they fail to indicate what is actually done during maintenance. Also the common theme of the above definitions is that maintenance is an "after-the-fact" activity. Based on these definitions, no maintenance activities occur during the software development effort (the pre-delivery stage of the life cycle of a software system). Maintenance occurs after the product is in operation (during the post-delivery stage). Schneiderwind (1987) believes that maintenance is difficult because of this shortsighted view that maintenance is a postdelivery activity. During development little consideration is made to the maintenance phase which is to follow, the developers considering their work done once the system is handed over to users. Other perspectives include: the 'bug-fixing' view which considers software maintenance as an activity involving the detection and correction of errors found in programs, the 'need-to-adapt' view which sees maintenance as a activity which entails changing the software when its operational environment or original requirement changes. the 'support' to users view where maintenance of software is seen as providing a service to users of the system.
The IEEE software maintenance standard, IEEE STD 1219-1993, which draws on these different views, defines software maintenance as: Modification of a software product after delivery, to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment (Van Edelstein 1993).
Can also be defined as, Maintenance is the activity associated with keeping operational computer systems continuously in tune with requirements of users & data processing operation. Software Maintenance is any work done to a computer program after its first installation and implementation in an operational environment The maintenance of software systems is motivated by a number of factors: To provide continuity of service: This entails fixing bugs, recovering from failures, and accommodating changes in the operating system and hardware, To support mandatory upgrades: These are usually caused by changes in government regulations, and also by attempts to maintain a competitive edge over rival products. To support user requests for improvements: Examples include enhancement of functionality, better performance and customisation to local working patterns. To facilitate future maintenance work: This usually involves code and database restructuring, and updating documentation.
1.5 Maintenance image problems The inclusion of the word maintenance in the term software maintenance has been linked to the negative image associated with the area. Higgins (1988) describes the problem: ...programmers, ..tend to think of program development as a form of puzzle solving, and it is reassuring to their ego when they manage to successfully complete a difficult section of code. Software maintenance, on the other hand, entails very little new creation and is therefore categorised as dull, unexciting detective work. Similarly, Schneidewind (1987) contends that to work in maintenance has been akin to having bad breath. Further, some authors argue that the general lack of consensus on software maintenance terminology has also contributed to the negative image associated with it.
2. Why is Software Maintenance necessary 2.1 Overview In order to answer this question we need to consider what happens when the system is delivered to the users. The users operate the system and may find things wrong with it, or identify things they would like to see added to it. Via management feedback the maintainer makes the approved corrections or improvements and the improved system is delivered to the users. The cycle then repeats itself, thus perpetuating the loop of maintenance and extending the life of the product. In most cases the maintenance phase ends up being the longest process of the entire life cycle, and so far outweighs the development phase in terms of time and cost. Error! Reference source not found. shows the lifecycle of maintenance on a software product and why (theoretically) it may be never ending.
Figure 1.1 The Maintenance Lifecycle Lehman's (1980) first two laws of software evolution help explain why the Operations and Maintenance phase can be the longest of the life-cycle processes. His first law is the Law of Continuing Change, which states that a system needs to change in order to be useful. The second law is the Law of Increasing Complexity, which states that the structure of a program deteriorates as it evolves. Over time, the structure of the code degrades until it becomes more cost-effective to rewrite the program. 3. Types of Software Maintenance In order for a software system to remain useful in its environment it may be necessary to carry out a wide range of maintenance activities upon it. Swanson (1976) was one of the first to examine what really happens during maintenance and was able to identify three different categories of maintenance activity. 3.1 Corrective Changes necessitated by actual errors (defects or residual "bugs") in a system are termed corrective maintenance. These defects manifest themselves when the system does not operate as it was designed or advertised to do. A defect or bug can result from design errors, logic errors and coding errors. Design errors occur when for example changes made to the software are incorrect, incomplete, wrongly communicated or the change request misunderstood. Logic errors result from invalid tests and conclusions, incorrect implementation of design specification, faulty logic flow or incomplete test data. Coding errors are caused by incorrect implementation of detailed logic design and incorrect use of the source code logic. Defects are also caused by data processing errors and system performance errors. All these errors, sometimes called residual errors or bugs prevent the software from conforming to its agreed specification. In the event of a system failure due to an error, actions are taken to restore operation of the software system. The approach here is to locate the original specifications in order to
determine what the system was originally designed to do. However, due to pressure from management, maintenance personnel sometimes resort to emergency fixes known as patching. The ad hoc nature of this approach often gives rise to a range of problems that include increased program complexity and unforeseen ripple effects. Increased program complexity usually arises from degeneration of program structure which makes the program increasingly difficult, if not impossible, to comprehend. This state of affairs can be referred to as the spaghetti syndrome or software fatigue. Unforeseen ripple effects imply a change to one part of a program may affect other sections in an unpredictable fashion. This is often due to lack of time to carry out a thorough impact analysis before effecting the change. Corrective maintenance has been estimated to account for 20% of all maintenance activities. 3.2 Adaptive Any effort that is initiated as a result of changes in the environment in which a software system must operate is termed adaptive change. Adaptive change is a change driven by the need to accommodate modifications in the environment of the software system, without which the system would become increasingly less useful until it became obsolete. The term environment in this context refers to all the conditions and influences which act from outside upon the system, for example business rules, government policies, work patterns, software and hardware operating platforms. A change to the whole or part of this environment will warrant a corresponding modification of the software. Unfortunately, with this type of maintenance the user does not see a direct change in the operation of the system, but the software maintainer must expend resources to effect the change. This task is estimated to consume about 25% of the total maintenance activity.
3.3 Perfective The third widely accepted task is that of perfective maintenance. This is actually the most common type of maintenance encompassing enhancements both to the function and the efficiency of the code and includes all changes, insertions, deletions, modifications, extensions, and enhancements made to a system to meet the evolving and/or expanding needs of the user. A successful piece of software tends to be subjected to a succession of changes resulting in an increase in its requirements. This is based on the premise that as the software becomes useful, the users tend to experiment with new cases beyond the scope for which it was initially developed. Expansion in requirements can take the form of enhancement of existing system functionality or improvement in computational efficiency. As the program continues to grow with each enhancement the system evolves from an average-sized program of average maintainability to a very large program that offers great resistance to modification. Perfective maintenance is by far the largest consumer of maintenance resources, estimates of around 50% are not uncommon. The categories of maintenance above were further defined in the 1993 IEEE Standard on Software Maintenance (IEEE 1219 1993) which goes on to define a fourth category.
3.4 Preventive The long-term effect of corrective, adaptive and perfective change is expressed in Lehman's law of increasing entropy: As a large program is continuously changed, its complexity, which reflects deteriorating structure, increases unless work is done to maintain or reduce it. (Lehman 1985). The IEEE defined preventative maintenance as "maintenance performed for the purpose of preventing problems before they occur" (IEEE 1219 1993). This is the process of changing software to improve its future maintainability or to provide a better basis for future enhancements. The preventative change is usually initiated from within the maintenance organisation with the intention of making programs easier to understand and hence facilitate future maintenance work. Preventive change does not usually give rise to a substantial increase in the baseline functionality. Preventive maintenance is rare (only about 5%) the reason being that other pressures tend to push it to the end of the queue. For instance, a demand may come to develop a new system that will improve the organisations competitiveness in the market. This will likely be seen as more desirable than spending time and money on a project that delivers no new function. Still, it is easy to see that if one considers the probability of a software unit needing change and the time pressures that are often present when the change is requested, it makes a lot of sense to anticipate change and to prepare accordingly. The most comprehensive and authoritative study of software maintenance was conducted by B. P. Lientz and E. B. Swanson (1980). Figure 1.2 depicts the distribution of maintenance activities by category by percentage of time from the Lientz and Swanson study of some 487 software organisations. Clearly, corrective maintenance (that is, fixing problems and routine debugging) is a small percentage of overall maintenance costs, Martin and McClure (1983) provide similar data.
Figure 1.2 Distribution of maintenance by categories 3.5 Maintenance as Ongoing Support This category of maintenance work refers to the service provided to satisfy nonprogramming related work requests. Ongoing support, although not a change in itself, is essential for successful communication of desired changes. The objectives of ongoing
support include effective communication between maintenance and end user personnel, training of end-users and providing business information to users and their organisations to aid decision making. Effective communication is essential as maintenance is probably the most customer-intensive part of the software life cycle, since a greater proportion of maintenance effort is spent providing enhancements requested by customers than is spent on other types of system change. Good customer relations are important for several reasons and can lead to a reduction in the misinterpretation of users change requests, a better understanding of users' business needs and increased user involvement in the maintenance process. Failure to achieve the required level of communication between the maintenance organisation and those affected by the software changes may eventually lead to software failure. Training of end users - typical services provided by the maintenance organisation include manuals, telephone support , help desk, on-site visits, informal short courses, and user groups. Business information - users need various types of timely and accurate business information (for example, time, cost, resource estimates) to enable them take strategic business decisions. Questions such as should we enhance the existing system or replace it completely may need to be considered.
4. The Importance of Categorising Software Changes 4.1 Overview In principle, software maintenance activities can be classified individually. In practice, however, they are often intertwined. For example, in the course of modifying a program due to the introduction of a new operating system (adaptive change), obscure 'bugs' may be introduced. The bugs have to be traced and dealt with (corrective maintenance). Similarly, the introduction of a more efficient sorting algorithm into a data processing package (perfective maintenance) may require that the existing program code be restructured (preventive maintenance). Figure 1.3 depicts the potential relations between the different types of software change. Despite the overlapping nature of these changes, there are several reasons why a good understanding of the distinction between them is important. Firstly, it allows management to set priorities for change requests. Some changes require a faster response than others. Secondly, there are limitations to software change. Ideally changes are implemented as the need for them arises. In practice, however this is not always possible for several reasons: Resource Limitations: Some of the major hindrances to the quality and productivity of maintenance activities are the lack of skilled and trained maintenance programmers and the suitable tools and environment to support their work. Cost may also be an issue. Quality of the existing system: In some 'old' systems, this can be so poor that any change can lead to unpredictable ripple effects and a potential collapse of the system. Organisational strategy: The desire to be on a par with other organisations, especially rivals, can be a great determinant of the size of a maintenance budget. Inertia: The resistance to change by users may prevent modification to a software product, however important or potentially profitable such change may be.
Thirdly software is often subject to incremental release where changes made to a software product are not always done all together. The changes take place incrementally, with minor changes usually implemented while a system is in operation. Major enhancements are usually planned and incorporated, together with other minor changes, in a new release or upgrade. The change introduction mechanism also depends on whether the software package is bespoke or off-the-shelf. With bespoke software, change can often be effected as the need for it arises. For off-the-shelf packages, users normally have to wait for the next upgrade. Swanson's definitions allow the software maintenance practitioner to be able to tell the user that a certain portion of a maintenance organisations efforts is devoted to userdriven or environment-driven requirements. The user requirements should not be buried with other types
Fig 1.3. The Relationship between the different types of software change of maintenance. The point here is that these types of updates are not corrective in naturethey are improvements and no matter which definitions are used, it is imperative to discriminate between corrections and enhancements. By studying the types of maintenance activities above it is clear that regardless of which tools and development model is used, maintenance is needed. The categories clearly indicate that maintenance is more than fixing bugs. This view is supported by Jones (1991), who comments that organisations lump enhancements and the fixing of bugs together. He goes on to say that this distorts both activities and leads to confusion and mistakes in estimating the time it takes to implement changes and budgets. Even worse, this "lumping" perpetuates the notion that maintenance is fixing bugs and mistakes. Because many maintainers do not use maintenance categories, there is confusion and misinformation about maintenance. 5. A Comparison between Development and Maintenance 5.1 Overview Although maintenance could be regarded as a continuation of development, there is a fundamental difference between the two activities. The constraints that the existing
system imposes on maintenance gives rise to this difference. For example, in the course of designing an enhancement, the designer needs to investigate the current system to abstract the architectural and the low-level designs. This information is then used to ascertain how the change can be accommodated predict the potential ripple effect of the change determine the skills and knowledge required to do the job
To explain the difference between new development and software maintenance further, Jones (1986) provides an interesting analogy: The task of adding functional requirements to existing systems can be likened to the architectural work of adding a new room to an existing building. The design will be severely constrained by the existing structure, and both the architect and the builders must take care not to weaken the existing structure when additions are made. Although the costs of the new room usually will he lower than the costs of constructing an entirely new building, the costs per square foot may be much higher because of the need to remove existing walls, reroute plumbing and electrical circuits and take special care to avoid disrupting the current site, (quoted in Corbi 1989). 6. The Cost of Maintenance 6.1 Overview Although there is no real agreement on the actual costs, sufficient data exist to indicate that maintenance does consume a large portion of overall software lifecycle costs. Arthur (1988) states that only a quarter to a third of all lifecycle costs are attributed to software development, and that some 67% of lifecycle costs are expended in the operations and maintenance phase of the life cycle. Jones (1994) states that maintenance will continue to grow and become the primary work of the software industry. Table 1.2 (Arthur 1988) provides a sample of data complied by various people and organisations regarding the percentage of lifecycle costs devoted to maintenance. These data were collected in the late 1970s, prior to all the software engineering innovations, methods, and techniques that purport to decrease overall costs. However, despite software engineering innovations, recent literature suggests that maintenance is gaining more notoriety because of its increasing costs. A research marketing firm, the Gartner Group, estimated that U.S. corporations alone spend over $30 billion annually on software maintenance, and that in the 1990s, 95% of lifecycle costs would go to maintenance (Moad 1990 Figure 1.4). Clearly, maintenance is costly, and the costs are increasing. All the innovative software engineering efforts from the 1970s and 1980s have not reduced lifecycle costs. Maintenance (%) 60 40-80 60-70
Survey Canning Boehm deRose/Nyman
Year 1972 1973 1976
Mills Zeikowitz Cashman and Holt
1976 1979 1979
75 67 60-80
Table 1.2 Maintenance Costs as a Percentage of Total Software Life-cycle Costs Today programmers' salaries consume the majority of the software budget, and most of their time is spent on maintenance as it is a labour intensive activity. As a result, organisations have seen the operations and maintenance phase of the software life cycle consume more and more resources over time. Others attribute rising maintenance costs to the age and lack of structure of the software. Osborne and Chikofsky (1990) state that much of today's software is ten to fifteen years old, and were created without benefit of the best design and coding techniques. The result is poorly designed structures, poor coding, poor logic, and poor documentation for the systems that must be maintained.
Figure 1.4 The Percentage of Software life-cycle costs devoted to maintenance. Over 75% of maintenance costs are for providing enhancements in the form of adaptive and perfective maintenance. These enhancements are significantly more expensive to complete than corrections as they require major redesign work and considerably more coding than a corrective action. The result is that the user driven enhancements (improvements) dominate the costs over the life cycle. Several later studies confirm that Lientz and Swanson's data from 1980 was still accurate in 1990. Table 1.3 summarises data from several researchers and shows that noncorrective work ranges from 78% to 84% of the overall effort, therefore that the majority of maintenance costs are being spent on enhancements. Maintenance is expensive therefore because requirements and environments change and the majority of maintenance costs are driven by users. Maintenance Category Lientz & Swanson 1980 Ball Deklava Abran 1987 1990 1990
Corrective Non-corrective
22% 78%
17% 83%
16% 84%
21% 79%
Table 1.3 Percentage effort spent on Corrective and Non-Corrective maintenance The situation at the turn of the millennium shows little sign of improvement. The Maintenance Budget Normally, after completing a lengthy and costly software development effort, organisations do not want to devote significant resources to postdelivery activities. Defining what is meant by a significant resource is in itself problematic how much should maintenance cost? Underestimation of maintenance costs is partly human nature as developers do not want to believe that maintenance for the new system will consume a significant portion of lifecycle costs. They hope that the new system will be the exception to the norm as modern software engineering techniques and methods were used. Therefore the software maintenance phase of the life cycle will not, by definition, consume large amounts of money. Accordingly, sufficient amounts of money are often not allocated for maintenance. With limited resources, maintainers can only provide limited maintenance. The lack of financial resources for maintenance is due in large part to the lack of recognition that "maintenance" is primarily enhancing delivered systems rather than correcting bugs. Another factor influencing high maintenance costs is that needed items are often not included initially in the development phase, usually due to schedules or monetary constraints, but are deferred until the operations and maintenance phase. Therefore maintainers end up spending a large amount of their time coding the functions that were delayed until maintenance. As a result development costs remain within budget but maintenance costs increase. As can be seen from Table 1.3, the maintenance categories are particularly useful when trying to explain the real costs of maintenance. If organisations have this data, they will understand why maintenance is expensive and will be able to defend their estimates of time to complete tasks and resources required. 7. A Software Maintenance Framework 7.1 Overview To a large extent the requirement for software systems to evolve in order to accommodate changing user needs contributes to the high maintenance cost. However, there are other factors which contribute indirectly to this by hindering maintenance activities. A Software Maintenance Framework (which is a derivative of the Software Maintenance Framework proposed by Haworth et al. 1992) will be used to discuss some of the factors that contribute to the maintenance challenge. The elements of this framework are the user requirements, organisational and operational environments, maintenance process, maintenance personnel, and the software product (Table 1.4). Component Feature
1. Users & requirements
Requests for additional functionality, error correction and improve maintainability Request for non-programming related support Change in policies Competition in the market place Hardware innovations Software innovations Capturing requirements Variation in programming and working practices Paradigm shift Error detection and correction Maturity and difficulty of application domain Quality of documentation Malleability of programs Complexity of programs Program structure Inherent quality Staff turnover Domain expertise
2. Organisational environment 3. Operational environment
4. Maintenance process
5. Software product
6. Maintenance personnel
Table: Components of the Software Maintenance Framework 7.2 Users and their Requirements Users often have little understanding of software maintenance and so can be unsupportive of the maintenance process. They may take the view that Software maintenance is like hardware maintenance Changing software is easy Changes cost too much and take too long
Users may be unaware that their request may involve major structural changes to the software which may take time to implement must be feasible, desirable, prioritised, scheduled and resourced may conflict against one another or against company policy such that it is never implemented
7.3 Organisational and Operational Environment An environmental factor is a factor which acts upon the product from outside and influences its form or operation. The two categories of environment are The organisational environment - e.g. business rules, government regulations, taxation policies, work patterns, competition in the market place
The operational environment - software systems e.g. operating systems, database systems, compilers and hardware systems e.g. processor, memory, peripherals
In this environment the scheduling of maintenance work can be problematic as urgent fixes always go to the head of the queue thus upsetting schedules and unexpected, mandatory large scale changes will also need urgent attention. Further problems can stem from the organisational environment in that the maintenance budget is often underfunded. 7.4 Maintenance Process The term process here refers to any activity carried out or action taken either by a machine or maintenance personnel during software maintenance. The facets of a maintenance process which affect the evolution of the software or contribute to maintenance costs include The difficulty of capturing change (and changing) requirements - requirements and user problems only become clearer when a system is in use. Also users may not be able to express their requirements in a form understandable to the analyst or programmer - the 'information gap'. The requirements and changes evolve, therefore the maintenance team is always playing catch-up Variation in programming practice this may present difficulties if there is no consistency, therefore standards or stylistic guidelines are often provided. Working practices impact on the way a change is effected. Time to change can be adversely affected by clever code; undocumented assumptions; and undocumented design and implementation decisions. After some time, programmers find it difficult to understand their own code. Paradigm shift - older systems developed prior to the advent of structured programming techniques may be difficult to maintain. However, existing programs may be restructured or 'revamped using techniques and tools e.g. structured programming, object orientation, hierarchical program decomposition, reformatters and pretty-printers. Error detection and correction - error-free software is virtually non-existent. Software products tend to have 'residual' errors. The later these errors are discovered the more expensive they are to correct. The cost gets even higher if the errors are detected during the maintenance phase (Figure 1.5).
Figure 1.5 Cost of fixing errors increases in later phases of the life cycle 7.5 Software Product Aspects of a software product that contribute to the maintenance challenge include Maturity and difficulty of the application domain: The requirements of applications that have been widely used and well understood are less likely to undergo substantial modifications than those that are still in their infancy. Inherent difficulty of the original problem: For example, programs dealing with simple problems such as sorting are obviously easier to handle than those used for more complex computations such as weather forecasting. Quality of the documentation: The lack of up-to-date systems documentation effects maintenance productivity. Maintenance is difficult to perform because of the need to understand (or comprehend) program code. Program understanding is a labour-intensive activity that increases costs. IBM estimated that a programmer spends around 50% of his time in the area of program analysis. Malleability of the programs: The malleable or 'soft' nature of software products makes them more vulnerable to undesirable modifications than hardware items. Inadvertent software changes may have unknown and even fatal repercussions. This is particularly true of 'safety-related' or 'safety-critical' systems. Inherent quality: The tendency for a system to decay as more changes are undertaken implies that preventive maintenance needs to be undertaken to restore order in the programs.
7.6 Maintenance Personnel This refers to individuals involved in the maintenance of a software product. They include maintenance managers, analysts, designers, programmers and testers. The aspects of personnel that affect maintenance activities include the following:
Staff turnover: Due to high staff turnover many systems end up being maintained by individuals who are not the original authors therefore a substantial proportion of the maintenance effort is spent on just understanding the code. Staff who leave take irreplaceable knowledge with them. Domain expertise: Staff may end up working on a system for which they have neither the system domain knowledge nor the application domain knowledge. Maintainers may therefore inadvertently cause the ripple effect. This problem may be worsened by the absence of documentation or out of date or inadequate documentation. A contrary situation is where a programmer becomes enslaved to a certain application because she/he is the only person who understands it.
Obviously the factors of product, environment, user and maintenance personnel do not exist in isolation but interact with one another. Three major types of relation and interaction that can be identified are product/environment, product/user and product/maintenance personnel (Figure 1.6). Relationship between product and environment - as the environment changes so must the product in order to be useful. Relationship between product and user - in order for the system to stay useful and acceptable to its users it also has to change to accommodate their changing requirements. Interaction between personnel and product - the maintenance personnel who implement changes also act as receptors of the changes. That is, they serve as the main avenue by which changes in the other factors - user requirements, maintenance process, organisational and operational environments - act upon the software product. The nature of the maintenance process used and the attributes of the maintenance personnel will impact upon the quality of the change.
Figure 1.6 Inter-relationship between maintenance factors 8. Potential Solutions to the Maintenance Problem
A number of possible solutions to maintenance problems have been suggested, they include Budget and Effort Reallocation Complete Replacement of the System Maintenance of the Existing System
8.1 Budget and Effort Reallocation Based on the observation that software maintenance costs at least as much as new development, some authors have proposed that rather than allocating less resource to develop unmaintainable or difficult to maintain systems, more time and resource should be invested in the development specification and design of more maintainable systems. However, this is difficult to pursue, and even if it was possible, would not address the problem of legacy systems that are already in a maintenance crisis situation. 8.2 Complete Replacement of the System One might be tempted to suggest that if maintaining an existing system costs as much as developing a new one, then why not develop a new system from scratch. In practice of course it is not that simple as there are costs and risks involved. The organisation may not be able to afford the capital outlay and there is no guarantee that the new system will function any better than the old one. Additionally the existing system represents a valuable knowledge base that could prove useful for the development of future systems thereby reducing the chance of re-inventing the wheel. It would be unrealistic for an organisation to part with such as asset. 8.3 Maintenance of the existing system In most maintenance situations budget and effort reallocation is not possible and completely redesigning the whole software system is usually undesirable (but nevertheless forced in some situations). Given that these approaches are beset with problems, maintaining the existing system is often the only alternative. Maintenance could be achieved, albeit it in an ad hoc fashion by making necessary modifications as patches to the source code, often by hitting alien code head-on. However this approach is fraught with difficulty and is often adopted because of time pressures. Corrective maintenance is almost always done in this way for the same reason. An alternative and cost-effective solution is to apply preventative maintenance. Preventative maintenance may take several forms, it may: be applied to those sections of the software that are most often the target for change (this approach may be very cost effective as it has been estimated that 80% of changes affect only 30% of the code). involve updating inaccurate documentation involve restructuring, reverse engineering or the re-use of existing software components
Unfortunately preventive maintenance is rare; the reason being that other pressures tend to push it to the end of the queue.
9. References Arthur 1988 Arthur, L. J. Software Evolution: The Software Maintenance Challenge. John Wiley & Sons, New York, 1988. ANSI/IEEE 1983 IEEE Standard Glossary of Software Engineering Terminology. Technical Report 729. 1983. Corbi 1989 Corbi, T. A. Program Understanding: Challenge for the 1990's. IBM Systems Journal, 28(2): 294-306, 1989. Higgins 1988 Higgins, D. A. Data Structured Maintenance: The Warnier/Orr Approach. Dorset House Publishing Co. Inc., New York, 1988. IEEE. 1993 IEEE standard 1219: Standard for Software Maintenance. Los Altimos, CA. IEEE Computer Society Press. Jones 1994 Jones, C. Assessment and Control of Software Risks, Englewood Cliffs, NJ: Prentice Hall. 1994. Jones 1991 Jones C. Applied Software Measurement, New York, NY: McGraw-Hill. 1991. Lehman 1980 Lehman M. M. On Understanding Laws, Evolution, and Conversation in the Large-Program Lifecycle. The Journal of Systems and Software. 1:213-21. 1980. Lehman 1985 Lehman M. M. Program Evolution, Academic Press, London 1985. Leintz & Swanson 1980 Lientz B. P. & Swanson E. B. Software Maintenance Management. Addison-Wesley Publishing Company. Reading, Massachusetts. 1980. Martin & McClune 1983 Martin J. & McClure C. Software Maintenance: The Problem and its Solutions. Englewood Cliffs, NJ: Prentice Hall. 1983. McDermid 1991 McDermid, J-Software Engineers Reference Book, Chapter 16, Butterworth-Heinemann 1991. Moad 1990 Moad J. Maintaining The Competitive Edge. DATAMATION 61-6. 1990. Osborne and Chikofsky 1990 Osborne W. M. & Chikofsky E. J. Fitting Pieces to the Maintenance Puzzle. IEEE Software. 10-11.1990. Pigoski 1996 Pigoski T. M. Practical Software Maintenance: Best Practice for Managing your Software Investment. New York, NY. John Wiley and Sons. 1996. Schneiderwind 1987 Schneidewind, N. F. The State of Software Maintenance. IEEE Transactions of Software Engineering, SE-13(3):303-10, March 1987. Swanson 1976 Swanson E. B. The Dimensions of Maintenance. In Proceedings of the Second International Conference on Software Engineering, pages 492-497, San Francisco, October 1976.
Van Edelstein 1993 Van Edelstein, D. Report on the IEEE STD 1219-1993 - Standard for Software Maintenance. ACM SIGSOFT - Software Engineering Notes, 18(4):94-95, October 1993. von Mayrhauser 1990 von Mayrhauser T. E. Software Engineering - Methods and Management. San Diego, CA. Academic Press, Inc. 1990.
System Design Overview

1. Introduction to Design 1.1 Introduction Design is an iterative process of transforming the requirements specification into a design specification. Consider an example where Mrs. & Mr. XYZ want a new house. Their requirements include,
a a a a a
room room room room room
for for for for for
two children to play and sleep Mrs. & Mr. XYZ to sleep cooking dining general activities
and so on. An architect takes these requirements and designs a house. The architectural design specifies a particular solution. In fact, the architect may produce several designs to meet this requirement. For example, one may maximize childrens room, and other minimizes it to have large living room. In addition, the style of the proposed houses may differ: traditional, modern and two-storied. All of the proposed designs solve the problem, and there may not be a best design. Software design can be viewed in the same way. We use requirements specification to define the problem and transform this to a solution that satisfies all the requirements in the specification. Some definitions for Design: Devising artifacts to attain goals [H.A. Simon, 1981]. The process of defining the architecture, component, interfaces and other characteristics of a system or component [ IEEE 160.12]. The process of applying various techniques and principles for the purpose of defining a device, a process or a system in sufficient detail to permit its physical realization [Webster Dictionary]. Without Design, System will be
- unmanageable since there is no concrete output until coding. Therefore it is difficult to monitor & control. - inflexible since planning for long term changes was not given due emphasis. - unmaintainable since standards & guidelines for design & construction are not used. No reusability consideration. Poor design may result in tightly coupled modules with low cohesion. Data disintegrity may also result. - inefficient due to possible data redundancy and untuned code. - not portable to various hardware / software platforms. Design is different from programming. Design brings out a representation for the program not the program or any component of it. The difference is tabulated below. Design Abstactions of operations and data("What to do") Establishes interfaces Choose between design alternatives Choose functions, syntax of language Make trade-offs w.r.t.constraints etc Devices representation of program 1.2 Qualities of a Good Design Functional: It is a very basic quality attribute. Any design solution should work, and should be constructable. Efficiency: This can be measured through run time (time taken to undertake whole of processing task or transaction) response time (time taken to respond to a request for information) throughput (no. of transactions / unit time) memory usage, size of executable, size of source, etc Construction of program Programming Device algorithms and data representations Consider run-time environments
Flexibility: It is another basic and important attribute. The very purpose of doing design activities is to build systems that are modifiable in the event of any changes in the requirements. Portability & Security: These are to be addressed during design - so that such needs are not hard-coded later. Reliability: It tells the goodness of the design - how it work successfully (More important for real-time and mission critical and on-line systems).
Economy: This can be achieved by identifying re-usable components. Usability: Usability is in terms of how the interfaces are designed (clarity, aesthetics, directness, forgiveness, user control, ergonomics, etc) and how much time it takes to master the system. 1.3 Design Constraints Typical Design Constraints are: Budget Time Integration with other systems Skills Standards Hardware and software platforms
Budget and Time cannot be changed. The problems with respect to integrating to other systems (typically client may ask to use a proprietary database that he is using) has to be studied & solution(s) are to be found. Skills is alterable (for example, by arranging appropriate training for the team). Mutually agreed upon standards has to be adhered to. Hardware and software platforms may remain a constraint. Designer try answer the How part of What is raised during the requirement phase. As such the solution proposed should be contemporary. To that extent a designer should know what is happening in technology. Large, central computer systems with proprietary architecture are being replaced by distributed network of low cost computers in an open systems environment We are moving away from conventional software development based on hand generation of code (COBOL, C) to Integrated programming environments. Typical applications today are internet based. 1.4 Popular Design Methods Popular Design Methods (Wasserman, 1995) include 1) Modular decomposition Based on assigning functions to components. It starts from functions that are to be implemented and explain how each component will be organized and related to other components.
2) Event-oriented decomposition Based on events that the system must handle. It starts with cataloging various states and then describes how transformations take place.
3) Object-oriented design Based on objects and their interrelationships It starts with object types and then explores object attributes and actions.
Structured Design - uses modular decomposition.
1.5 Transition from Analysis to Design
(source: Pressman, R.S. Software Engineering: A Practitioners Approach, fifth edition, McGraw-Hill, 2001) The data design transforms the data model created during analysis into the data structures that will be required to implement the software. The architectural design defines the relationship between major structural elements of the software, the design patterns that can be used to achieve the requirements and the constraints that affect the implementation. The interface design describes how the software communicates within itself, and with humans who use it.
The Procedural design (typically, Low Level Design) elaborates structural elements of the software into procedural (algorithmic) description. 2. High Level Design Activities Broadly, High Level Design include Architectural Design, Interface Design and Data Design. 2.1 Architectural Design Shaw and Garlan (1996) suggest that software architecture is the first step in producing a software design. Architecture design associates the system capabilities with the system components (like modules) that will implement them. The architecture of a system is a comprehensive framework that describes its form and structure, its components and how they interact together. Generally, a complete architecture plan addresses the functions that the system provides, the hardware and network that are used to develop and operate it, and the software that is used to develop and operate it. An architecture style involves its components, connectors, and constraints on combining components. Shaw and Garlan (1996) describe seven architectural styles. Commonly used styles include Pipes and Filters Call-and-return systems o Main program / subprogram architecture Object-oriented systems Layered systems Data-centered systems Distributed systems o Client/Server architecture
In Pipes and Filters, each component (filter) reads streams of data on its inputs and produces streams of data on its output. Pipes are the connectors that transmit output from one filter to another. e.g. Programs written in Unix shell In Call-and-return systems, the program structure decomposes function into a control hierarchy where a main program invokes (via procedure calls) a number of program components, which in turn may invoke still other components. e.g. Structure Chart is a hierarchical representation of main program and subprograms. In Object-oriented systems, component is an encapsulation of data and operations that must be applied to manipulate the data. Communication and coordination between components is accomplished via message calls. In Layered systems, each layer provides service to the one outside it, and acts as a client to the layer inside it. They are arranged like an onion ring. e.g. OSI ISO model. Data-centered systems use repositories. Repository includes a central data structure representing current state, and a collection of independent components that operate on
the central data store. In a traditional database, the transactions, in the form of an input stream, trigger process execution. e.g. Database A popular form of distributed system architecture is the Client/Server where a server system responds to the requests for actions / services made by client systems. Clients access server by remote procedure call. For a detailed description of architecture styles, read [1] (OPTIONAL) The following issues are also addressed during architecture design: Security Data Processing: Centralized / Distributed / Stand-alone Audit Trails Restart / Recovery User Interface Other software interfaces
2.2 User Interface Design The design of user interfaces draws heavily on the experience of the designer. Pressman [2] (Refer Chapter 15) presents a set of Human-Computer Interaction (HCI) design guidelines that will result in a "friendly," efficient interface. Three categories of HCI design guidelines are 1. General interaction 2. Information display 3. Data entry General Interaction Guidelines for general interaction often cross the boundary into information display, data entry and overall system control. They are, therefore, all-encompassing and are ignored at great risk. The following guidelines focus on general interaction. o Be consistent Use a consistent format for menu selection, command input, data display and the myriad other functions that occur in a HCI. o Offer meaningful feedback Provide the user with visual and auditory feedback to ensure that two way communication (between user and interface) is established. o Ask for verification of any nontrivial destructive action If a user requests the deletion of a file, indicates that substantial information is to be overwritten, or asks for the termination of a program, an "Are you sure ..." message should appear. o Permit easy reversal of most actions UNDO or REVERSE functions have saved tens of thousands of end users from millions of hours of frustration. Reversal should be available in every interactive application. o Reduce the amount of information that must be memorized between actions
o o
The user should not be expected to remember a list of numbers or names so that he or she can re-use them in a subsequent function. Memory load should be minimized. Seek efficiency in dialog, motion, and thought Keystrokes should be minimized, the distance a mouse must travel between picks should be considered in designing screen layout, the user should rarely encounter a situation where he or she asks, "Now what does this mean." Forgive mistakes The system should protect itself from errors that might cause it to fail (defensive programming) Categorize activities by functions and organize screen geography accordingly One of the key benefits of the pull down menu is the ability to organize commands by type. In essence, the designer should strive for "cohesive" placement of commands and actions. Provide help facilities that are context sensitive Use simple action verbs or short verb phrases to name commands A lengthy command name is more difficult to recognize and recall. It may also take up unnecessary space in menu lists.
Information Display If information presented by the HCI is incomplete, ambiguous or unintelligible, the application will fail to satisfy the needs of a user. Information is "displayed" in many different ways with text, pictures and sound, by placement, motion and size, using color, resolution, and even omission. The following guidelines focus on information display. o Display only information that is relevant to the current context The user should not have to wade through extraneous data, menus and graphics to obtain information relevant to a specific system function. o Dont bury the user with data, use a presentation format that enables rapid assimilation of information Graphs or charts should replace voluminous tables. o Use consistent labels, standard abbreviations, and predictable colors The meaning of a display should be obvious without reference to some outside source of information. o Allow the user to maintain visual context If computer graphics displays are scaled up and down, the original image should be displayed constantly (in reduced form at the corner of the display) so that the user understands the relative location of the portion of the image that is currently being viewed. o Produce meaningful error messages o Use upper and lower case, indentation, and text grouping to aid in understanding Much of the information imparted by a HCI is textual, yet, the layout and form of the text has a significant impact on the ease with which information is assimilated by the user. o Use windows to compartmentalize different types of information Windows enable the user to "keep" many different types of information within easy reach. o Use analog displays to represent information that is more easily assimilated with this form of representation
For example, a display of holding tank pressure in an oil refinery would have little impact if a numeric representation were used. However, a thermometer-like display were used, vertical motion and color changes could be used to indicate dangerous pressure conditions. This would provide the user with both absolute and relative information. Consider the available geography of the display screen and use it efficiently When multiple windows are to be used, space should be available to show at least some portion of each. In addition, screen size (a system engineering issues should be selected to accommodate the type of application that is to be implemented.
Data Input Much of the user's time is spent picking commands, typing data and otherwise providing system input. In many applications, the keyboard remains the primary input medium, but the mouse, digitizer and even voice recognition systems are rapidly becoming effective alternatives. The following guidelines focus on data input: o Minimize the number of input actions required of the user Reduce the amount of typing that is required. This can be accomplished by using the mouse to select from pre-defined sets of input; using a "sliding scale" to specify input data across a range of values; using "macros" that enable a single keystroke to be transformed into a more complex collection of input data. o Maintain consistency between information display and data input The visual characteristics of the display (e.g., text size, color, placement) should be carried over to the input domain. o Allow the user to customize the input An expert user might decide to create custom commands or dispense with some types of warning messages and action verification. The HCI should allow this. o Interaction should be flexible but also tuned to the users preferred mode of input The user model will assist in determining which mode of input is preferred. A clerical worker might be very happy with keyboard input, while a manager might be more comfortable using a point and pick device such as a mouse. o Deactivate commands that are inappropriate in the context of current actions. This protects the user from attempting some action that could result in an error. o Let the user control the interactive flow The user should be able to jump unnecessary actions, change the order of required actions (when possible in the context of an application), and recover from error conditions without exiting from the program. o Provide help to assist with all input actions o Eliminate Mickey mouse input Do not let the user to specify units for engineering input (unless there may be ambiguity). Do not let the user to type .00 for whole number dollar amounts, provide default values whenever possible, and never let the user to enter information that can be acquired automatically or computed within the program.
3. Structured Design Methodology The two major design methodologies are based on Functional decomposition Object-oriented approach
3.1 Structured Design Structured design is based on functional decomposition, where the decomposition is centered around the identification of the major system functions and their elaboration and refinement in a top-down manner. It follows typically from dataflow diagram and associated processes descriptions created as part of Structured Analysis. Structured design uses the following strategies Transformation analysis Transaction analysis
and a few heuristics (like fan-in / fan-out, span of effect vs. scope of control, etc.) to transform a DFD into a software architecture (represented using a structure chart). In structured design we functionally decompose the processes in a large system (as described in DFD) into components (called modules) and organize these components in a hierarchical fashion (structure chart) based on following principles: Abstraction (functional) Information Hiding Modularity
Abstraction "A view of a problem that extracts the essential information relevant to a particular purpose and ignores the remainder of the information." -- [IEEE, 1983] "A simplified description, or specification, of a system that emphasizes some of the system's details or properties while suppressing others. A good abstraction is one that emphasizes details that are significant to the reader or user and suppress details that are, at least for the moment, immaterial or diversionary." -- [Shaw, 1984] While decomposing, we consider the top level to be the most abstract, and as we move to lower levels, we give more details about each component. Such levels of abstraction provide flexibility to the code in the event of any future modifications. Information Hiding ... Every module is characterized by its knowledge of a design decision which it hides from all others. Its interface or definition was chosen to reveal as little as possible about its inner workings." --- [Parnas, 1972]
Parnas advocates that the details of the difficult and likely-to-change decisions be hidden from the rest of the system. Further, the rest of the system will have access to these design decisions only through well-defined, and (to a large degree) unchanging interfaces. This gives a greater freedom to programmers. As long as the programmer sticks to the interfaces agreed upon, she can have flexibility in altering the component at any given point. There are degrees of information hiding. For example, at the programming language level, C++ provides for public, private, and protected members, and Ada has both private and limited private types. In C language, information hiding can be done by declaring a variable static within a source file. The difference between abstraction and information hiding is that the former (abstraction) is a technique that is used to help identify which information is to be hidden. The concept of encapsulation as used in an object-oriented context is essentially different from information hiding. Encapsulation refers to building a capsule around some collection of things [Wirfs-Brock et al, 1990]. Programming languages have long supported encapsulation. For example, subprograms (e.g., procedures, functions, and subroutines), arrays, and record structures are common examples of encapsulation mechanisms supported by most programming languages. Newer programming languages support larger encapsulation mechanisms, e.g., "classes" in Simula, Smalltalk and C++, "modules" in Modula, and "packages" in Ada. Modularity Modularity leads to components that have clearly defined inputs and outputs, and each component has a clearly stated purpose. Thus, it is easy to examine each component separately from others to determine whether the component implements its required tasks. Modularity also helps one to design different components in different ways, if needed. For example, the user interface may be designed with object orientation and the security design might use state-transition diagram. 3.2 Strategies for converting the DFD into Structure Chart Steps [Page-Jones, 1988] Break the system into suitably tractable units by means of transaction analysis Convert each unit into into a good structure chart by means of transform analysis Link back the separate units into overall system implementation
Transaction Analysis The transaction is identified by studying the discrete event types that drive the system. For example, with respect to railway reservation, a customer may give the following transaction stimulus:
The three transaction types here are: Check Availability (an enquiry), Reserve Ticket (booking) and Cancel Ticket (cancellation). On any given time we will get customers interested in giving any of the above transaction stimuli. In a typical situation, any one stimulus may be entered through a particular terminal. The human user would inform the system her preference by selecting a transaction type from a menu. The first step in our strategy is to identify such transaction types and draw the first level breakup of modules in the structure chart, by creating separate module to co-ordinate various transaction types. This is shown as follows:
The Main ( ) which is a over-all coordinating module, gets the information about what transaction the user prefers to do through TransChoice. The TransChoice is returned as a parameter to Main ( ). Remember, we are following our design principles faithfully in decomposing our modules. The actual details of how GetTransactionType ( ) is not relevant for Main ( ). It may for example, refresh and print a text menu and prompt the user to select a choice and return this choice to Main ( ). It will not affect any other components in our breakup, even when this module is changed later to return the same input through graphical interface instead of textual menu. The modules Transaction1 ( ), Transaction2 ( ) and Transaction3 ( ) are the coordinators of transactions one, two and three respectively. The details of these transactions are to be exploded in the next levels of abstraction. We will continue to identify more transaction centers by drawing a navigation chart of all input screens that are needed to get various transaction stimuli from the user. These are to be factored out in the next levels of the structure chart (in exactly the same way as seen before), for all identified transaction centers. Transform Analysis Transform analysis is strategy of converting each piece of DFD (may be from level 2 or level 3, etc.) for all the identified transaction centers. In case, the given system has only
one transaction (like a payroll system), then we can start transformation from level 1 DFD itself. Transform analysis is composed of the following five steps [Page-Jones, 1988]: 1. 2. 3. 4. 5. Draw a DFD of a transaction type (usually done during analysis phase) Find the central functions of the DFD Convert the DFD into a first-cut structure chart Refine the structure chart Verify that the final structure chart meets the requirements of the original DFD Let us understand these steps through a payroll system example: Identifying the central transform
The central transform is the portion of DFD that contains the essential functions of the system and is independent of the particular implementation of the input and output. One way of identifying central transform (Page-Jones, 1988) is to identify the centre of the DFD by pruning off its afferent and efferent branches. Afferent stream is traced from outside of the DFD to a flow point inside, just before the input is being transformed into some form of output (For example, a format or validation process only refines the input does not transform it). Similarly an efferent stream is a flow point from where output is
formatted for better presentation. The processes between afferent and efferent stream represent the central transform (marked within dotted lines above). In the above example, P1 is an input process, and P6 & P7 are output processes. Central transform processes are P2, P3, P4 & P5 - which transform the given input into some form of output. First-cut Structure Chart
To produce first-cut (first draft) structure chart, first we have to establish a boss module. A boss module can be one of the central transform processes. Ideally, such process has to be more of a coordinating process (encompassing the essence of transformation). In case we fail to find a boss module within, a dummy coordinating module is created
In the above illustration, we have a dummy boss module Produce Payroll which is named in a way that it indicate what the program is about. Having established the boss module, the afferent stream processes are moved to left most side of the next level of structure chart; the efferent stream process on the right most side and the central transform processes in the middle. Here, we moved a module to get valid timesheet (afferent process) to the left side (indicated in yellow). The two central transform processes are move in the middle (indicated in orange). By grouping the other two central transform processes with the respective efferent processes, we have created two modules (in blue) essentially to print results, on the right side. The main advantage of hierarchical (functional) arrangement of module is that it leads to flexibility in the software. For instance, if Calculate Deduction module is to select deduction rates from multiple rates, the module can be split into two in the next level one to get the selection and another to calculate. Even after this change, the Calculate Deduction module would return the same value. Refine the Structure Chart
Expand the structure chart further by using the different levels of DFD. Factor down till you reach to modules that correspond to processes that access source / sink or data stores. Once this is ready, other features of the software like error handling, security, etc. has to be added. A module name should not be used for two different modules. If the same module is to be used in more than one place, it will be demoted down such that fan in can be done from the higher levels. Ideally, the name should sum up the activities done by the module and its sub-ordinates.
Verify Structure Chart vis--vis with DFD
Because of the orientation towards the end-product, the software, the finer details of how data gets originated and stored (as appeared in DFD) is not explicit in Structure Chart. Hence DFD may still be needed along with Structure Chart to understand the data flow while creating low-level design. Constructing Structure Chart (An illustration)
Some characteristics of the structure chart as a whole would give some clues about the quality of the system. Page-Jones (1988) suggest following guidelines for a good decomposition of structure chart: Avoid decision splits - Keep span-of-effect within scope-of-control: i.e. A module can affect only those modules which comes under its control (All sub-ordinates, immediate ones and modules reporting to them, etc.) Error should be reported from the module that both detects an error and knows what the error is. Restrict fan-out (number of subordinates to a module) of a module to seven. Increase fan-in (number of immediate bosses for a module). High fan-ins (in a functional way) improves reusability.
Refer [Page-Jones, 1988: Chapters 7 & 10] for more guidelines & illustrations on structure chart. 3.3 How to measure the goodness of the design To Measure design quality, we use coupling (the degree of interdependence between two modules), and cohesion (the measure of the strength of functional relatedness of elements within a module). Page-Jones gives a good metaphor for understanding coupling and cohesion: Consider two cities A & B, each having a big soda plant C & D respectively. The employees of C are predominantly in city B and employees of D in city A. What will happen
to the highway traffic between city A & B? By placing employees associated to a plant in the city where plant is situated improves the situation (reduces the traffic). This is the basis of cohesion (which also automatically improve coupling). COUPLING Coupling is the measure of strength of association established by a connection from one module to another [Stevens et al., 1974]. Minimizing connections between modules also minimizes the paths along which changes and errors can propagate into other parts of the system (ripple effect). The use of global variables can result in an enormous number of connections between the modules of a program. The degree of coupling between two modules is a function of several factors [Stevens et al., 1974]: (1) How complicated the connection is, (2) Whether the connection refers to the module itself or something inside it, and (3) What is being sent or received. Table 1 summarizes various types of coupling [Page-Jones, 1988]. NORMAL Coupling (acceptable) DATA Coupling Two modules are data coupled if they communicate by parameters (each being an elementary piece of data) STAMP Coupling Two modules are stamp coupled if one passes to other a composite piece of data (a piece of data with meaningful internal structure) CONTROL Coupling Two modules are control coupled if one passes to other a piece of information intended to control the internal logic of the other COMMON (or CONTENT (or GLOBAL) PATHOLOGIC Coupling AL) Coupling (unacceptable) (forbidden) Two modules are common coupled if they refer to the same global data area Two modules exhibit content coupled if one refers to the inside of the other in any way (if one module jumps inside another module) Jumping inside a module violate all the design principles like abstraction, information hiding and modularity.
e.g. sin Instead of e.g. calc_order_amt e.g. print_report (theta)returning (PO_Details)returning (what_to_print_flag) communicating sine value through value of the order parameters, two modules use a calc_interest global data. (amt, interest rate, term)returning interest amt.
We aim for a loose coupling. We may come across a (rare) case of module A calling module B, but no parameters passed between them (neither send, nor received). This is strictly should be positioned at zero point on the scale of coupling (lower than Normal Coupling itself) [Page-Jones, 1988]. Two modules A &B are normally coupled if A calls B B returns to A (and) all information passed between them is by means of parameters passed through the call mechanism. The other two types of coupling (Common and Content) are abnormal coupling and not desired. Even in Normal Coupling we should take care of following issues [Page-Jones, 1988]:
Data coupling can become complex if number of parameters communicated between is large. In Stamp coupling there is always a danger of over-exposing irrelevant data to called module. (Beware of the meaning of composite data. Name represented as a array of characters may not qualify as a composite data. The meaning of composite data is the way it is used in the application NOT as represented in a program) What-to-do flags are not desirable when it comes from a called module (inversion of authority): It is alright to have calling module (by virtue of the fact, is a boss in the hierarchical arrangement) know internals of called module and not the other way around.
In general, Page-Jones also warns of tramp data and hybrid coupling. When data is passed up and down merely to send it to a desired module, the data will have no meaning at various levels. This will lead to tramp data. Hybrid coupling will result when different parts of flags are used (misused?) to mean different things in different places (Usually we may brand it as control coupling but hybrid coupling complicate connections between modules). Page-Jones advocates a way to distinguish data from control flags (data are named by nouns and control flags by verbs). Two modules may be coupled in more than one way. In such cases, their coupling is defined by the worst coupling type they exhibit [Page-Jones, 1988]. COHESION Designers should aim for loosely coupled and highly cohesive modules. Coupling is reduced when the relationships among elements not in the same module are minimized. Cohesion on the other hand aims to maximize the relationships among elements in the same module. Cohesion is a good measure of the maintainability of a module [Stevens et al., 1974]. Stevens, Myers, Constantine, and Yourdon developed a scale of cohesion (from highest to lowest): 1. 2. 3. 4. 5. 6. 7. Functional Cohesion (Best) Sequential Cohesion Communicational Cohesion Procedural Cohesion Temporal Cohesion Logical Cohesion Coincidental Cohesion (Worst)
Let us create a module that calculates average of marks obtained by students in a class: calc_stat(){ // only a pseudo code read (x[]) a = average (x) print a }
average(m){ sum=0 for i = 1 to N{ sum = sum + x[i]} return (sum/N) } In average() above, all of the elements are related to the performance of a single function. Such a functional binding (cohesion) is the strongest type of binding. Suppose we need to calculate standard deviation also in the above problem, our pseudo code would look like: calc_stat(){ // only a pseudo code read (x[]) a = average (x) s = sd (x, a) print a, s } average(m){ // same as before } sd (m, y){//function to calculate standard deviation } Now, though average() and sd() are functionally cohesive, calc_stat() has a sequential binding (cohesion). Like a factory assembly line, functions are arranged in sequence and output from average() goes as an input to sd(). Suppose we make sd() to calculate average also, then calc_stat() has two functions related by a reference to the same set of input. This results in communication cohesion. Let us make calc-stat() into a procedure as below:
calc_stat(){ sum = sumsq = count = 0 for i = 1 to N read (x[i]) sum = sum + x[i] sumsq = sumsq + x[i]*x[i] } a = sum/N s = // formula to calculate SD print a, s } Now, instead of binding functional units with data, calc-stat() is involved in binding activities through control flow. calc-stat() has made two statistical functions into a procedure. Obviously, this arrangement affects reuse of this module in a different context (for instance, when we need to calculate only average not std. dev.). Such cohesion is called procedural. A good design for calc_stat () could be (data not shown):
In a temporally bound (cohesion) module, the elements are related in time. The best examples of modules in this type are the traditional initialization, termination, housekeeping, and clean-up modules. A logically cohesive module contains a number of activities of the same kind. To use the module, we may have to send a flag to indicate what we want (forcing various activities
sharing the interface). Examples are a module that performs all input and output operations for a program. The activities in a logically cohesive module usually fall into same category (validate all input or edit all data) leading to sharing of common lines of code (plate of spaghetti?). Suppose we have a module with all possible statistical measures (like average, standard deviation, mode, etc.). If we want to calculate only average, the call to it would look like calc_all_stat (x[], flag1, flag2, para1,). The flags are used to indicate out intent. Some parameters will also be left blank. When there is no meaningful relationship among the elements in a module, we have coincidental cohesion. Refer [Page-Jones, 1988: Chapters 5 & 6] for more illustrations and exercises on coupling and cohesion. 4. Data Design The data design can start from ERD and Data Dictionary. The first choice the designer has to make is to choose between file-based system or database system (DBMS). File based systems are easy to design and implement. Processing speed may be higher. The major draw back in this arrangement is that it would lead to isolate applications with respective data and hence high redundancy between applications. On the other hand, a DBMS allows you to plug in your application on top of an integrated data (organization-wide). This arrangement ensures minimum redundancy across applications. Change is easier to implement without performance degradation. Restart/recovery feature (ability to recover from hardware/software failures) is usually part of DBMS. On the flip side, because of all this in-built features, DBMS is quite complex and may be slower. A File is a logical collection of records. Typically the fields in the records are the attributes. Files can be accessed either sequentially or randomly; Files are organized using Sequential or Indexed or Indexed Sequential organization. E.F. Codd introduced a technique called normalization, to minimize redundancy in a table. Consider, R (St#, StName, Major, {C#, CTitle, Faculty, FacLoc, Grade}) The following are the assumptions given: Every course is handled by only one faculty Each course is associated to a particular faculty Faculty is unique No student is allowed to repeat a course
This table needs to be converted to 1NF as there is more than one value in a cell. This is illustrated below:
After removing repeating groups, R becomes R1(St#, StName, Major) (UNDERLINE) R2(St#,C#, CTitle, Faculty, FacLoc, Grade) as illustrated below:
The primary key of the parent table needs to be included in the new table so as to ensure no loss of information. Now, R1 is in 2NF as it has no composite key. However, R2 has a composite key of St# and C# which determine Grade. Also C# alone determines
Inorder to convert it into 2NF we have to remove partial dependency. R2 now becomes R21(St#, C#, Grade) R22(C#, CTitle, Faculty, FacLoc) Now, R21 & R22 are in 2NF which is illustrated as below:
Now, R1 and R21 are in 3NF also. However, in R22 there exists one transitive dependency, as Faculty determines FacLoc, shown as below:
Inorder to convert it into 3NF we have to remove this transitive dependency. R22 now becomes R221(C#, CTitle, Faculty*) R222(Faculty, FacLoc), assuming Faculty is unique.
Now R221 and R222 are in 3NF. The final 3NF reduction for the given relation is: Student (St#, StName, Major) R1 Mark_List (St#, C#, Grade) - R21 Course (C#, CTitle, Faculty*) R221 and Faculty (Faculty, FacLoc) R22 De-normalization can be done for improving performance (as against flexibility) For example, Employee (Name, Door#, StreetName, Location, Pin) may become two table during 3NF with a separate table for maintaining Pin information. If application does not
have any frequent change to Pin, then we can de-normalize this two table back to its original form by allowing some redundancy for having improved performance. Guidelines for converting ERD to tables: Each simple entity --> Table / File For Super-type / Sub-type entities o Separate / one table for all entities o Separate table for sub & super-type If M:M relationship between entities, create one Table / File for relationship with primary key of both entity forming composite key. Add inherited attribute (foreigh key) depending on retrieval need (Typically on the M side of 1:M relationship)
References [1] Shaw, Mary & Garlan, David. Software Architecture: Perspectives on an Emerging Discipline, Prentice Hall, 1996. [2] Roger S. Pressman. Software Engineering: A Practitioners Approach, 5th edition, McGraw-Hill, 2000. [3] Ben Shneiderman. Designing the User Interface: Strategies for Effective HumanComputer Interaction, 3rd edition, Addison-Wesley, 1998. [4] [IEEE, 1983]. IEEE, IEEE Standard Glossary of Software Engineering Terminology, The Institute of Electrical and Electronic Engineers, New York,New York, 1983. [5] [Parnas, 1972b]. D.L. Parnas, "On the Criteria To Be Used in Decomposing Systems Into Modules," Communications of the ACM, Vol. 5, No. 12, December 1972, pp. 10531058. [6] [Shaw, 1984]. M. Shaw, "Abstraction Techniques in Modern Programming Languages," IEEE Software, Vol. 1, No. 4, October 1984, pp. 10 - 26. [7] [Wirfs-Brock et al, 1990]. R. Wirfs-Brock, B. Wilkerson, and L. Wiener, Designing Object-Oriented Software, Prentice-Hall, Englewood Cliffs, New Jersey, 1990. [8] [Page-Jones, 1988] Page-Jones, Meilir. The Practical Guide to Structured Systems Design, Second Edition, Prentice Hall, 1988. [9] [Stevens et al., 1974] W.P. Stevens, G.J. Myers, and L.L. Constantine, Structured Design, IBM Systems Journal, Vol. 13, No. 2, 1974. (Reprinted in IBM Systems Journal, Vol. 38, Nos. 2 &3, pp. 231-256, 1999) Recommended Text Books [10] Pankaj Jalote. An Integrated Approach to Software Engineering, Narosa Pub. House, 1997.
[11] Meilir Page-Jones. Practical Guide to Structured Systems Design, 2nd Edition, Prentice-Hall, 1988. [12] Date, C.J. Introduction to Database Systems, 7th edition, Addison-Wesley, 2000. [13] Ben Shneiderman. Designing the User Interface: Strategies for Effective HumanComputer Interaction, 3rd edition, Addison-Wesley, 1998. [14] Grady Booch. Object-Oriented Analysis and Design With Applications , 2nd edition, Addison-Wesley, 1994. [15] Garlan, D. and Shaw, M. An Introduction to Software Architecture, CMU Software Engineering Institute Report, CMU-CS-94-166, January, 1994.
Testing and Debugging

1. Introduction to Testing 1.1 A Self-Assessment Test [1] Take the following test before starting your learning: A program reads three integer values. The three values are interpreted as representing the lengths of the sides of a triangle. The program displays a message that states whether the given sides can make a scalene or isosceles or equilateral triangle (Triangle Program) On a sheet of paper, write a set of test cases that would adequately test this program. Evaluate the effectiveness of your test cases using this list of common errors. Testing is an important, mandatory part of software development; it is a technique for evaluating product quality and also for indirectly improving it, by identifying defects and problems [2].
What is common between these disasters?
Ariane 5, Explosion, 1996 AT&T long distance service fails for nine hours, 1990 Airbus downing during Iran-conflict, 1988 Shut down of Nuclear Reactors, 1979 ... Software faults!!
Refer Prof. Thomas Huckles site for a collection of software bugs: http://wwwzenger.informatik.tu-muenchen.de/persons/huckle/bugse.html and refer http://www.cs.tau.ac.il/~nachumd/verify/horror.html for software horror stories! 1.2 How do we decide correctness of a software? To answer this question, we need to first understand how a software can fail. Failure, in the software context, is non-conformance to requirements. Failure may be due to one or more faults [3]: Error or incompleteness in the requirements
Difficulty in implementing the specification in the target environment Faulty system or program design Defects in the code
From the variety of faults (Refer Pfleeger [3] for an excellent discussion on various types of faults) above, it is clear that testing cannot be seen as an activity that will start after coding phase Software testing is an activity that encompasses the whole development life cycle. We need to test a program to demonstrate the existence of a fault. Contradictory to the terms, demonstrating correctness is not a demonstration that the program works properly. Testing brings negative connotations to our normal understanding. Myers classic (which is still regarded as the best fundamental book on testing), The Art of Software Testing lists the following as the most important testing principles: (Definition): Testing is the process of executing a program with the intent of finding errors A good test case is one that has a high probability of detecting an as-yet undiscovered error A successful test case is one that detects an as-yet undiscovered error.
Myers [1] discusses more testing principles: Test case definition includes expected output (a test oracle) Programmers should avoid testing their own programs (third party testing?) Inspect the result of each test Include test cases for invalid & unexpected input conditions See whether the program does what it is not supposed to do (error of commission) Avoid throw-away test cases (test cases serve as a documentation for future maintenance) Do not assume that the program is bug free (non-developer, rather a destructive mindset is needed) The probability of more errors is proportional to the number of errors already found
Consider the following diagram:
If we abstract a software to be a mathematical function f, which is expected to produce various outputs for inputs from a domain (represented diagrammatically above), then it is clear from the definition of Testing by Myers that Testing cannot prove correctness of a program it is just a series of experiments to find out errors (as f is usually a discrete function that maps the input domain to various outputs that can be observed by executing the program) There is nothing like 100% error-free code as it is not feasible to conduct exhaustive testing, proving 100% correctness of a program is not possible.
We need to develop an attitude for egoless programming and keep a goal of eliminating as many faults as possible. Statistics on review effectiveness and common sense says that
prevention is better than cure. We need to place static testing also in place to capture an error before it becomes a defect in the software. Recent agile methodologies like extreme programming addresses these issues better with practices like test-driven programming and paired programming (to reduce the psychological pressure on individuals and to bring review part of coding) [4] 1.3 Testing Approaches There are two major approaches to testing: Black-box (or closed box or data-driven or input/output driven or behavioral) testing White-box (or clear box or glass box or logic-driven) testing
If the testing component is viewed as a black-box, the inputs have to be given to observe the behavior (output) of the program. In this case, it makes sense to give both valid and invalid inputs. The observed output is then matched with expected result (from specifications). The advantage of this approach is that the tester need not worry about the internal structure of the program. However, it is impossible to find all errors using this approach. For instance, if one tried three equilateral triangle test cases for the triangle program, we cannot be sure whether the program will detect all equilateral triangles. The program may contain a hard coded display of scalene triangle for values (300,300,300). To exhaustively test the triangle program, we need to create test cases for all valid triangles up to MAXIMUM integer size. This is an astronomical task but still not exhaustive (Why?) To be sure of finding all possible errors, we not only test with valid inputs but all possible inputs (including invalid ones like characters, float, negative integers, etc.). White-box approach examines the internal structure of the program. This means logical analysis of software element where testing is done to trace all possible paths of control flow. This method of testing exposes both errors of omission (errors due to neglected specification) and also errors of commission (something not defined by the specification). Let us look at a specification of a simple file handling program as given below. The program has to read employee details such as employee id, name, date of joining and department as input from the user, create a file of employees and display the records sorted in the order of employee ids. Examples for errors of omission: Omission of Display module, Display of records not in sorted order of employee ids, file created with fewer fields etc. Example of error of commission: Additional lines of code deleting some arbitrary records from the created file. (No amount of black box testing can expose such errors of commission as the method uses specification as the reference to prepare test cases. ) However, it is many a time practically impossible to do complete white box testing to trace all possible paths of control flow as the number of paths could astronomically large. For instance, a program segment which has 5 different control paths (4 nested if-then-else) and if this segment is iterated 20 times, the number of unique paths would be 520+519++51 = 1014 or 100 trillion [1]. If we were able to complete one test case every
5 minutes, it would take approximately one billion years to test every unique path. Due to dependency of decisions, not all control paths may be feasible. Hence, actually, we may not be testing all these paths. Even if we manage to do an exhaustive testing of all possible paths, it may not guarantee that the program work as per specification: Instead of sorting in descending order (required by the specification), the program may sort in ascending order. Exhaustive path testing will not address missing paths and data-sensitive errors. A black-box approach would capture these errors! In conclusion, It is not feasible to do exhaustive testing either in block or in white box approaches. None of these approaches are superior meaning, one has to use both approaches, as they really complement each other. Not, but not the least, static testing still plays a large role in software testing.
The challenge, then, lies in using a right mix of all these approaches and in identifying a subset of all possible test cases that have highest probability of detecting most errors. The details of various techniques under black and white box approach are covered in Test Techniques. References [1] Myers, G.J. The Art of Software Testing, John Wiley & Sons, 1979. [2] Bertolino, A. Software Testing, in Pierre Bourque and Robert Dupuis (eds.). Guide to the Software Engineering Body of Knowledge (SWEBOK), IEEE, 2001. [3] Pfleeger, S.L. Software Engineering: Theory and Practice, 2nd edition, Prentice Hall, 2001. [4] Kent Beck. Extreme Programming Explained: Embrace Change, Addison-Wesley, 1999. 2. Levels of Testing 2.1 Overview In developing a large system, testing usually involves several stages (Refer the following figure [2]). Unit Testing Integration Testing System Testing Acceptance Testing
Initially, each program component (module) is tested independently verifying the component functions with the types of input identified by studying components design. Such a testing is called Unit Testing (or component or module testing). Unit testing is done in a controlled environment with a predetermined set of data fed into the component to observe what output actions and data are produced. When collections of components have been unit-tested, the next step is ensuring that the interfaces among the components are defined and handled properly. This process of verifying the synergy of system components against the program Design Specification is called Integration Testing. Once the system is integrated, the overall functionality is tested against the Software Requirements Specification (SRS). Then, the other non-functional requirements like performance testing are done to ensure readiness of the system to work successfully in a customers actual working environment. This step is called System Testing. The next step is customers validation of the system against User Requirements Specification (URS). Customer in their working environment does this exercise of Acceptance Testing usually with assistance from the developers. Once the system is accepted, it will be installed and will be put to use. 2.2 Unit Testing Pfleeger [2] advocates the following steps to address the goal of finding faults in modules (components): Examining the code Typically the static testing methods like Reviews, Walkthroughs and Inspections are used (Refer RWI course)
Proving code correct o After coding and review exercise if we want to ascertain the correctness of the code we can use formal methods. A program is correct if it implements the functions and data properly as indicated in the design, and if it interfaces properly with all other components. One way to investigate program correctness is to view
the code as a statement of logical flow. Using mathematical logic, if we can formulate the program as a set of assertions and theorems, we can show that the truth of the theorems implies the correctness of the code. o Use of this approach forces us to be more rigorous and precise in specification. Much work is involved in setting up and carrying out the proof. For example, the code for performing bubble sort is much smaller than its logical description and proof.
Testing program components (modules) o In the absence of simpler methods and automated tools, Proving code correctness will be an elusive goal for software engineers. Proving views programs in terms of classes of data and conditions and the proof may not involve execution of the code. On the contrary, testing is a series of experiments to observe the behaviour of the program for various input conditions. While proof tells us how a program will work in a hypothetical environment described by the design and requirements, testing gives us information about how a program works in its actual operating environment. o To test a component (module), input data and conditions are chosen to demonstrate an observable behaviour of the code. A test case is a particular choice of input data to be used in testing a program. Test case are generated by using either black-box or white-box approaches (Refer Test Techniques)
2.3 Integration Testing Integration is the process of assembling unit-tested modules. We need to test the following aspects that are not previously addressed while independently testing the modules: Interfaces: To ensure interface integrity, the transfer of data between modules is tested. When data is passed to another module, by way of a call, there should not be any loss or corruption of data. The loss or corruption of data can happen due to mis-match or differences in the number or order of calling and receiving parameters. Module combinations may produce a different behaviour due to combinations of data that are not exercised during unit testing. Global data structures, if used, may reveal errors due to unintended usage in some module.
Integration Strategies Depending on design approach, one of the following integration strategies can be adopted: Big Bang approach Incremental approach Top-down testing Bottom-up testing Sandwich testing
To illustrate, consider the following arrangement of modules:
Big Bang approach consists of testing each module individually and linking all these modules together only when every module in the system has been tested.
Though Big Bang approach seems to be advantageous when we construct independent module concurrently, this approach is quite challenging and risky as we integrate all modules in a single step and test the resulting system. Locating interface errors, if any, becomes difficult here. The alternative strategy is an incremental approach, wherein modules of a system are consolidated with already tested components of the system. In this way, the software is gradually built up, spreading the integration testing load more evenly through the construction phase. Incremental approach can be implemented in two distinct ways: Topdown and Bottom-up. In Top-down testing, testing begins with the topmost module. A module will be integrated into the system only when the module which calls it has been already integrated successfully. An example order of Top-down testing for the above illustration will be:
The testing starts with M1. To test M1 in isolation, communications to modules M2, M3 and M4 have to be somehow simulated by the tester somehow, as these modules may not be ready yet. To simulate responses of M2, M3 and M4 whenever they are to be invoked from M1, stubs are created. Simple applications may require stubs which simply return control to their superior modules. More complex situation demand stubs to simulate a full
range of responses, including parameter passing. Stubs may be individually created by the tester (as programs in their own right) or they may be provided by a software testing harness, which is a piece of software specifically designed to provide a testing environment. In the above illustration, M1 would require stubs to simulate the activities of M2, M3 and M4. The integration of M3 would require a stub or stubs (?!) for M5 and M4 would require stubs for M6 and M7. Elementary modules (those which call no subordinates) require no stubs. Bottom-up testing begins with elementary modules. If M5 is ready, we need to simulate the activities of its superior, M3. Such a driver for M5 would simulate the invocation activities of M3. As with the stub, the complexity of a driver would depend upon the application under test. The driver would be responsible for invoking the module under test, it could be responsible for passing test data (as parameters) and it might be responsible for receiving output data. Again, the driving function can be provided through a testing harness or may be created by the tester as a program. The following diagram shows the bottom-up testing approach for the above illustration:
For the above example, driver must be provided for modules M2, M5, M6, M7, M3 and M4. There is no need for a driver for the topmost node, M1. Myers [1] lists the advantages and disadvantages of Top-down testing and Bottom-up testing: Testing Advantages TopDown Advantageous if major flaws occur toward the top of the program Early skeletal program allows demonstrations and boosts morale Disadvantages Stub modules must be produced Test conditions my be impossible, or very difficult, to create Observation of test output is more difficult, as
only simulated values will be used initially. For the same reason, program correctness can be misleading
Bottomup
Advantageous if major flaws occur toward the bottom of the program Test conditions are easier to create Observations of test results is easier (as live data is used from the beginning)
Driver modules must be produced The program as an entity does not exist until the last module is added
To overcome the limitations and to exploit the advantages of Top-down and Bottom-up testing, a sandwich testing is used [2]: The system is viewed as three layers the target layer in the middle, the levels above the target, and the levels below the target. A topdown approach is used in the top layer and a bottom-up one in the lower layer. Testing converges on the target layer, chosen on the basis of system characteristics and the structure of the code. For example, if the bottom layer contains many general-purpose utility programs, the target layer (the one above) will be components using the utilities. This approach allows bottom-up testing to verify the utilities correctness at the beginning of testing. Choosing an integration strategy [2] depends not only on system characteristics, but also on customer expectations. For instance, the customer may want to see a working version as soon as possible, so we may adopt an integration schedule that produces a basic working system early in the testing process. In this way coding and testing can go concurrently. 2.4 System Testing The objective of unit and integration testing was to ensure that the code implemented the design properly. In system testing, we need to ensure that the system does what the customer wants it to do. Initially the functions (functional requirements) performed by the system are tested. A function test checks whether the integrated system performs its functions as specified in the requirements. After ensuring that the system performs the intended functions, the performance test is done. This non-functional requirement includes security, accuracy, speed, and reliability. System testing begins with function testing. Since the focus here is on functionality, a black-box approach is taken (Refer Test Techniques). Function testing is performed in a controlled situation. Since function testing compares the systems actual performance with its requirements, test cases are developed from requirements document (SRS). For example [2], a word processing system can be tested by examining the following functions: document creation, document modification and document deletion. To test document modification, adding a character, adding a word, adding a paragraph, deleting a
character, deleting a word, deleting a paragraph, changing the font, changing the type size, changing the paragraph formatting, etc. are to be tested. Performance testing addresses the non-functional requirements. System performance is measured against the performance objectives set by the customer. For example, function testing may have demonstrated how the system handles deposit or withdraw transactions in a bank account package. Performance testing evaluates the speed with which calculations are made, the precision of the computation, the security precautions required, and the response time to user inquiry. Types of Performance Tests [2] 1. Stress tests evaluates the system when stressed to its limits. If the requirements state that a system is to handle up to a specified number of devices or users, a stress test evaluates system performance when all those devices or users are active simultaneously. This test brings out the performance during peak demand. 2. Volume tests addresses the handling of large amounts of data in the system. This includes Checking of whether data structures have been defined large enough to handle all possible situations, Checking the size of fields, records and files to see whether they can accommodate all expected data, and Checking of systems reaction when data sets reach their maximum size.
3. Configuration tests analyzes the various software and hardware configurations specified in the requirements. (e.g. system to serve variety of audiences) 4. Compatibility tests are needed when a system interfaces with other systems (e.g. system to retrieve information from a large database system) 5. Regression tests are required when the system being tested is replacing an existing system (Always used during a phased development to ensure that new systems performance is at leaset as good as that of the old) 6 Security tests ensure the security requirements (testing characteristics related to availability, integrity, and confidentiality of data and services) 7 Timing tests include response time, transaction time, etc. Usually done with stress test to see if the timing requirements are met even when the system is extremely active. 8 Environmental tests look at the systems ability to perform at the installation site. If the requirements include tolerances to heat, humidity, motion, chemical presence, moisture, portability, electrical or magnetic fields, disruption of power, or any other environmental characteristics of the site, then our tests should ensure that the system performs under these conditions. 9 Quality tests evaluate the systems reliability, maintainability, and availability. These tests include calculation of mean time to failure and mean time to repair, as well as average time to find and fix a fault.
10 Recovery tests addresses response to the loss of data, power, device or services. The system is subjected to loss of system resources and tested if it recovers properly. 11 Maintenance tests addresses the need for diagnostic tools and procedures to help in finding the source of problems. To verify existence and functioning of aids like diagnostic program, memory map, traces of transactions, etc. 12 Documentation tests ensures documents like user guides, maintenance guides and technical documentation exists and to verify consistency of information in them. 13 Human factor (or Usability) tests investigates user interface related requirements. Display screens, messages, report formats and other aspects are examined for ease of use.
2.5 Acceptance Testing Acceptance testing is the customer (and user) evaluation of the system, primarily to determine whether the system meets their needs and expectations. Usually acceptance test is done by customer with assistance from developers. Customers can evaluate the system either by conducting a benchmark test or by a pilot test [2]. In benchmark test, the system performance is evaluated against test cases that represent typical conditions under which the system will operate when actually installed. A pilot test installs the system on an experimental basis, and the system is evaluated against everyday working. Sometimes the system is piloted in-house before the customer runs the real pilot test. The in-house test, in such case, is called an alpha test, and the customers pilot is a beta test. This approach is common in the case of commercial software where the system has to be released to a wide variety of customers. A third approach, parallel testing, is done when a new system is replacing an existing one or is part of a phased development. The new system is put to use in parallel with previous version and will facilitate gradual transition of users, and to compare and contrast the new system with the old. References [1] Myers, G.J. The Art of Software Testing, John Wiley & Sons, 1979. [2] Pfleeger, S.L. Software Engineering: Theory and Practice, 2nd edition, Prentice Hall, 2001. 3. Test Techniques We shall discuss Black Box and White Box approach. 3.1 Black Box Approach - Equivalence Partitioning - Boundary Value Analysis
- Cause Effect Analysis - Cause Effect Graphing - Error Guessing I. Equivalence Partitioning Equivalence partitioning is partitioning the input domain of a system into finite number of equivalent classes in such a way that testing one representative from a class is equivalent to testing any other value from that class. To put this in simpler words, since it is practically infeasible to do exhaustive testing, the next best alternative is to check whether the program extends similar behaviour or treatment to a certain group of inputs. If such a group of values can be found in the input domain treat them together as one equivalent class and test one representative from this. This can be explained with the following example. Consider a program which takes Salary as input with values 12000...37000 in the valid range. The program calculates tax as follows: - Salary up to Rs. 15000 No Tax - Salary between 15001 and 25000 Tax is 18 % of Salary - Salary above 25000 Tax is 20% of Salary
Here, the specification contains a clue that certain groups of values in the input domain are treated equivalently by the program. Accordingly, the valid input domain can be divided into three valid equivalent classes as below: c1 : values in the range 12000...15000 c2: values in the range 15001...25000 c3: values > 25000 However, it is not sufficient that we test only valid test cases. We need to test the program with invalid data also as the users of the program may give invalid inputs, intentionally or unintentionally. It is easy to identify an invalid class c4: values < 12000. If we assume some maximum limit (MAX) for the variable Salary, we can modify the class c3 above to values in the range 25001...MAX and identify an invalid class c5: values > MAX. Depending on the system, MAX may be either defined by the specification or defined by the hardware or software constraints later during the design phase.
If we further expand our discussion and assume that user or tester of the program may give any value which can be typed in through the keyboard as input, we can form the equivalence classes as explained below. Since the input has to be salary it can be seen intuitively that numeric and non-numeric values are treated differently by the program. Hence we can form two classes - class of non-numeric values - class of numeric values Since all non-numeric values are treated as invalid by the program class c1 need not be further subdivided.
Class of numeric values needs further subdivision as all elements of the class are not treated alike by the program. Again, within this class, if we look for groups of values meeting with similar treatment from the program the following classes can be identified: values values values values values < 12000 in the range 1200015000 in the range 1500125000 in the range 25001...MAX > MAX
Each of these equivalent classes need not be further subdivided as the program should treat all values within each class in a similar manner. Thus the equivalent classes identified for the given specification along with a set of sample test cases designed using these classes are shown in the following table. Input Condition (Salary) A non-numeric value A numeric value < 12000 A numeric value >=12000 and <= 15000 A numeric value >=150001 and <= 25000 A numeric value >=250001 and <= MAX A numeric value > MAX Expected Result (Tax amount) Error Msg: Invalid Input Error Msg: Invalid Input No Tax Actual/Observed Remarks Result
Class c1 class of nonnumeric values c2 - values < 12000 c3 - vaues in the range 1200015000 c4 - values in the range 1500125000 c5 - values in the range 25001...MAX c6 - values > MAX
Tax = 18% of Salary Tax = 20% of Salary Error Msg: Invalid Input
We can summarise this discussion as follows: To design test cases using equivalence partitioning, for a range of valid input values identify - one valid value within the range - one invalid value below the range and - one invalid value above the range Similarly, To design test cases for a specific set of values - one valid case for each value belonging to the set - one invalid value
Eg: Test Cases for Types of Account (Savings, Current) will be - Savings, Current (valid cases) - Overdraft (invalid case) It may be noted that we need fewer test cases if some test cases can cover more than one equivalent class. II. Boundary Value Analysis Even though the definition of equivalence partitioning states that testing one value from a class is equivalent to testing any other value from that class, we need to look at the boundaries of equivalent classes more closely. This is so since boundaries are more error prone. To design test cases using boundary value analysis, for a range of values, - Two valid cases at both the ends - Two invalid cases just beyond the range limits Consider the example discussed in the previous section. For the valid equivalence class c2-2: values in the range 12000...15000 of Salary, the test cases using boundary value analysis are Input Expected Result Actual/Observed Remarks Condition (Tax amount) Result (Salary) 11999 12000 15000 150001 Invalid input No Tax No Tax Tax = 18% of Salary
If we closely look at the Expected Result column we can see that for any two successive input values the expected results are always different. We need to perform testing using boundary value analysis to ensure that this difference is maintained. The same guidelines need to be followed to check output boundaries also. Other examples of test cases using boundary value analysis are: - A compiler being tested with an empty source program - Trigonometric functions like TAN being tested with values near p/2 - A function for deleting a record from a file being tested with an empty data file or a data file with just one record in it.
Though it may sound that the method is too simple, boundary value analysis is one of the most effective methods for designing test cases that reveal common errors made in programming. III. Cause Effect Analysis The main drawback of the previous two techniques is that they do not explore the combination of input conditions. Cause effect analysis is an approach for studying the specifications carefully and identifying the combinations of input conditions (causes) and their effect in the form of a table and designing test cases It is suitable for applications in which combinations of input conditions are few and readily visible. IV. Cause Effect Graphing This is a rigorous approach, recommended for complex systems only. In such systems the number of inputs and number of equivalent classes for each input could be many and hence the number of input combinations usually is astronomical. Hence we need a systematic approach to select a subset of these input conditions. Guidelines for graphing : Divide specifications into workable pieces as it may be practically difficult to work on large specifications. Identify the causes and their effects. A cause is an input condition or an equivalence class of input conditions. An effect is an output condition or a system transformation. Link causes and effects in a Boolean graph which is the cause-effect graph. Make decision tables based on the graph. This is done by having one row each for a node in the graph. The number of columns will depend on the number of different combinations of input conditions which can be made. Convert the columns in the decision table into test cases. Consider the following specification: A program accepts Transaction Code - 3 characters as input. For a valid input the following must be true. 1st character (denoting issue or receipt) + for issue - for receipt
2nd character - a digit 3rd character - a digit
To carry out cause effect graphing, the control flow graph is constructed as below.
In the graph: (1) or (2) must be true (V in the graph to be interpreted as OR) (3) and (4) must be true (? in the graph to be interpreted as AND)
The Boolean graph has to be interpreted as follows: - node (1) turns true if the 1st character is + - node (2) turns true if the 1st character is - (both node (1) and node (2) cannot be true simultaneously) - node(3) becomes true if the 2nd character is a digit - node(4) becomes true if the 3rd character is a digit - the intermediate node (5) turns true if (1) or (2) is true (i.e., if the 1st character is + or -) - the intermediate node (6) turns true if (3) and (4) are true (i.e., if the 2nd and 3rd characters are digits) - The final node (7) turns true if (5) and (6) are true. (i.e., if the 1st character is + or -, 2nd and 3rd characters are digits) - The final node will be true for any valid input and false for any invalid input. A partial decision table corresponding to the above graph: Node (1) (2) (3) Some possible combination of node states 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 1 0 1
(4) (5) (6) (7) Sample Test Case for the Column
0 0 0 0 $xy
0 1 0 0 +ab
1 1 0 0 +a4
0 0 1 0 +2y
1 0 1 0 @45
1 1 1 1 +67
The sample test cases can be derived by giving values to the input characters such that the nodes turn true/false as given in the columns of the decision table. V. Error Guessing Error guessing is a supplementary technique where test case design is based on the tester's intuition and experience. There is no formal procedure. However, a checklist of common errors could be helpful here. 3.2 Test Techniques White Box Approach - Basis Path Testing Basis Path Testing is white box testing method where we design test cases to cover every statement, every branch and every predicate (condition) in the code which has been written. Thus the method attempts statement coverage, decision coverage and condition coverage. To perform Basis Path Testing Derive a logical complexity measure of procedural design o Break the module into blocks delimited by statements that affect the control flow (eg.: statement like return, exit, jump etc. and conditions) o Mark out these as nodes in a control flow graph o Draw connectors (arcs) with arrow heads to mark the flow of logic o Identify the number of regions (Cyclomatic Number) which is equivalent to the McCabes number Define a basis set of execution paths o Determine independent paths Derive test case to exercise (cover) the basis set
McCabes Number (Cyclomatic Complexity) Gives a quantitative measure of the logical complexity of the module Defines the number of independent paths Provides an upper bound to the number of tests that must be conducted to ensure that all the statements are executed at least once. Complexity of a flow graph g, v(g), is computed in one of three ways: o V(G) = No. of regions of G o V(G) = E - N +2 (E: No. of edges & N: No. of nodes) o V(G) = P + 1 (P: No. of predicate nodes in G or No. of conditions in the code)
McCabes Number = No. of Regions (Count the mutually exclusive closed regions and also the whole outer space as one region) = 2 (in the above graph) Two other formulae as given below also define the above measure: McCabes Number = E - N +2 ( = 6 6 +2 = 2 for the above graph) McCabes Number = P + 1 ( =1 + 1= 2 for the above graph) Please note that if the number of conditions is more than one in a single control structure, each condition needs to be separately marked as a node. When the McCabes number is 2, it indicates that there two linearly independent paths in the code. i.e., two different ways in which the graph can be traversed from the 1st node to the last node. The independents in the above graph are: i) 1-2-3-5-6 ii) 1-2-4-5-6 The last step is to write test cases corresponding to the listed paths. This would mean giving the input conditions in such a way that the above paths are traced by the control of execution. The test cases for the paths listed here are show in the following table. Path i) ii) Input Condition value of a > value of b Expected Result Increment a by 1 Actual Result Remarks
value of a <= value of b Increment b by 1
4. When to stop testing When To Stop Testing ? The question arises as testing is never complete and we cannot scientifically prove that a software system does not contain any more errors. Common Criteria Practiced Stop When Scheduled Time For Testing Expires Stop When All The Test Cases Execute Without Detecting Errors Both are meaningless and counterproductive as the first can be satisfied by doing absolutely nothing and the second is equally useless as it does not ensure quality of test cases. Stop When All test cases, derived from equivalent partitioning, cause-effect analysis & boundaryvalue analysis, are executed without detecting errors. Drawbacks Rather than defining a goal & allowing the tester to select the most appropriate way of achieving it, it does the opposite !!! Defined methodologies are not suitable for all occasions !!! No way to guarantee that the particular methodology is properly & rigorously used Depends on the abilities of the tester & not quantification attempted ! Completion Criterion Based On The Detection Of Pre-Defined Number Of Errors In this method the goal of testing is positively defined as to find errors and hence this is a more goal oriented approach. Eg. - Testing of module is not complete until 3 errors are discovered - For a system test : Detection of 70 errors or an elapsed time of 3 months, whichever come later How To Determine "Number Of Predefined Errors " ? Predictive Models Based on the history of usage / initial testing & the errors found
Defect Seeding Models Based on the initial testing & the ratio of detected seeded errors to detected unseeded errors (Very critically depends on the quality of 'seeding') Using this approach, as an example, we can say that testing is complete if 80% of the predefined number of errors are detected or the scheduled four months of testing is over, whichever comes later. Caution ! The Above Condition May Never Be Achieved For The Following Reasons Over Estimation Of Predefined Errors (The Software Is Too Good !!) Inadequate Test Cases Hence a best completion criterion may be a combination of all the methods discussed Module Test Defining test case design methodologies (such as boundary value analysis...) Function & System Test Based on finding the pre-defined number of defects 5. Debugging Debugging occurs as a consequence of successful testing. It is an exercise to connect the external manifestation of the error and the internal cause of the error Debugging techniques include use of: Breakpoints A point in a computer program at which execution can be suspended to permit manual or automated monitoring of program performance or results Desk Checking A technique in which code listings, test results or other documentation are visually examined, usually by the person who generated them, to identify errors, violations of development standards or other problems Dumps
1. A display of some aspect of a computer programs execution state, usually the contents of internal storage or registers 2. A display of the contents of a file or device Single-Step Operation In this debugging technique a single computer instruction, is executed in response to an external signal. Traces A record of the execution of computer program, showing the sequence of instructions executed, the names and values of variables or both. 6. References 1. Glenford J. Myers. The ART of Software Testing, 2nd edition, Wiley, 2004 2. Roger S. Pressman. Software Engineering: A Practitioners Approach, 5th edition, McGraw-Hill, 2000.
An Abbreviated C++ Code Inspection Checklist
John T. Baldwin October 27, 1992
Copyright 1992 by John T. Baldwin. See rear page for complete information concerning copyright permission, sources, and distribution.
How to Conduct an Informal Code Inspection
1.
Code inspector teams consist of 2-5 individuals. The author of the code to be inspected is not part of the initial inspection team! A code inspection is not a witch hunt so no witch-hunting! Our purpose here is to improve the code, not to evaluate developers. To get ready for the inspection, print separate hardcopies of the source code for each inspector. A single code inspector should cover no more than 250 source code lines, including comments, but not including whitespace. Surprisingly enough, this is a "carved in stone" limit! The hardcopy should contain a count of the source code lines shown. Inspection overview. The code author spends 20 - 40 minutes explaining the general layout of the code to the inspectors. The inspectors are not allowed to ask questions the code is supposed to answer them, but this overview is designed to speed up the process. The author's goal is to stick to the major, important points, and keep it as close to 20 minutes as possible without undercutting the explanation. Individual inspections. Each inspector uses the attached checklist to try to put forward a maximum number of discovered possible defects. This should be done in a single, uninterrupted sitting. The inspector should have a goal of covering 70-120 source lines of code per hour. Use the source line counts on the hardcopy, and strive not to inspect too quickly, nor too slowly! [This has been shown in several studies to be the next major factor after programming experience which affects the number of errors found. There is a sharp drop-off beyond 122 sloc/hr, so don't rush!] To do the inspection, go through the code line by line, attempting to fully understand what you are reading. At each line or block of code, skim through the inspection checklist, looking for questions which apply. For each applicable question, find whether the answer is "yes." A yes answer means a probable defect. Write it down. You will notice that some of the questions are very low-level and concern themselves with syntactical details, while others are high-level and require an understanding of what a block of code does. Be prepared to change your mental focus.
2.
3.
4.
5.
Meeting. The meeting is attended by all the code inspectors for that chunk of code. If you want this to be more like a formal inspection, the meeting should have a moderator who is well experienced in C or C++, and in conducting code inspections, and the author of the code should not be present. To be more like a walkthrough, the moderator may be omitted, or the author may be present, or both. If the author is present, it is for the purpose of collecting feedback, not for defending or explaining the code. Remember, one of the major purposes of the inspection is to ensure that the code is sufficiently self-explanatory. Each meeting is strictly limited to two hours duration, including interruptions. This is because inspection ability generally drops off after this amount of time. Strive to stay on task, and to not allow interruptions.
Page 1
Different inspectors may cover different groups of code for a single meeting. Thus, a single meeting could theoretically cover a maximum of (5 inspectors) (120 sloc/hr) (2 hrs) = 1200 lines of source code. In actuality, there should be some overlap between inspectors, up to the case of everyone having inspected the same code. If the group is not finished at the end of two hours, quit. Do not attempt to push ahead. The moderator or note taker should submit the existing notes to the author or maintainer, and the remaining material should be covered in a subsequent meeting. 6. Rework. The defects list is submitted to the author, or to another assigned individual for "rework." This can consist of changing code, adding or deleting comments, restructuring or relocating things, etc. Note that solutions are not discussed at the inspection meeting! They are neither productive nor necessary in that setting. If the author/maintainer desires feedback on solutions or improvements, he or she may hold a short meeting with any or all of the inspectors, following the code inspection meeting. The "improvements" meeting is led by the author/maintainer, who is free to accept or reject any suggestions from the attenders. Follow up. It is the moderator's personal responsibility to ensure all defects have been satisfactorily reworked. If there is no formal moderator, then an individual is selected for this role at the inspection meeting. The correctness of the rework will be verified either at a short review meeting, or during later inspection stages during the project. Record keeping. In order to objectively track success in detecting and correcting defects, one of the by-products of the meeting will be a count of the total number of different types of potential defects noted. In order to eliminate both the perception and the possibility that the records will be used to evaluate developers (remember, the goal is to improve the software), then neither the name of the author nor the source module will be noted in the defect counts. If absolutely necessary to keep the counts straight, a "code module number" may be assigned and used. The document containing the pairing of code modules with their numbers will be maintained by a single individual who has no management responsibilities on the project, and this document will be destroyed upon completion of the code development phase of the project.
7.
8.
Page 2
C++ Inspection Checklist
1 1.1
VARIABLE DECLARATIONS Arrays
1.1.1 Is an array dimensioned to a hard-coded constant? int should be int intarray[TOT_MONTHS+1]; intarray[13];
1.1.2 Is the array dimensioned to the total number of items? char entry[TOTAL_ENTRIES]; should be char entry[LAST_ENTRY+1]; The first example is extremely error-prone and often gives rise to off-by-one errors in the code. The preferred (second) method permits the writer to use the LAST_ENTRY identifier to refer to the last item in the array. Instances which require a buffer of a certain size are rarely rendered invalid by this practice, which results in the buffer being one element bigger than absolutely necessary. 1.2 Constants
1.2.1 Does the value of the variable never change? int should be const unsigned months_in_year = 12; months_in_year = 12;
1.2.2 Are constants declared with the preprocessor #define mechanism? #define MAX_FILES 20
should be const unsigned MAX_FILES = 20;
Page 3
1.2.3 Is the usage of the constant limited to only a few (or perhaps only one) class? If so, is the constant global? const unsigned MAX_FOOS = 1000; const unsigned MAX_FOO_BUFFERS = 40; should be class foo { public: enum { MAX_INSTANCES = 1000; } ... private: enum { MAX_FOO_BUFFERS = 40; } ... }; If the size of the constant exceeds int, another mechanism is available: class bar { public: static const long MAX_INSTS; ... }; const long bar::MAX_INSTS = 70000L; The keyword static ensures there is only one instance of the variable for the entire class. Static data items are not permitted to be initialized within the class declaration, so the initialization line must be included in the implementation file for class bar. Static constant members have one drawback: you cannot use them to declare member data arrays of a certain size. This is because the value is not available to the compiler at the point which the array is declared in the class. 1.3 Scalar Variables
1.3.1 Does a negative value of the variable make no sense? If so, is the variable signed? int should be unsigned int age; age;
This is an easy error to make, since the default types are usually signed.
Page 4
1.3.2 Does the code assume char is either signed or unsigned? typedef char SmallInt; SmallInt mumble = 280; The typedefs should be typedef unsigned char typedef signed char SmallUInt; SmallInt;
// WRONG on Borland C++ 3.1 // or MSC/C++ 7.0!
1.3.3 Does the program unnecessarily use float or double? double should be unsigned long acct_balance; acct_balance;
In general, the only time floating point arithmetic is necessary is in scientific or navigational calculations. It is slow, and subject to more complex overflow and underflow behavior than integer math is. Monetary calculations, as above, can often be handled in counts of cents, and formatted properly on output. Thus, acct_balance might equal 103446, and print out as $1,034.46. 1.4 Classes
1.4.1 Does the class have any virtual functions? If so, is the destructor non-virtual? Classes having virtual functions should always have a virtual destructor. This is necessary since it is likely that you will hold an object of a class with a pointer of a lessderived type. Making the destructor virtual ensures that the right code will be run if you delete the object via the pointer.
1.4.2 Does the class have any of the following: Copy-constructor Assignment operator Destructor If so, it generally will need all three. (Exceptions may occasionally be found for some classes having a destructor with neither of the other two.)
Page 5
2 2.1
DATA USAGE Strings
2.1.1 Can the string ever not be null-terminated? 2.1.2 Is the code attempting to use a strxxx() function on a non-terminated char array, as if it were a string?
2.2
Buffers
2.2.1 Are there always size checks when copying into the buffer? 2.2.2 Can the buffer ever be too small to hold its contents? For example, one program had no size checks when reading data into a buffer because the correct data would always fit. But when the file it read was accidentally overwritten with incorrect data, the program crashed mysteriously. 2.3 Bitfields
2.3.1 Is a bitfield really required for this application? 2.3.2 Are there possible ordering problems (portability)?
3 3.1
INITIALIZATION Local Variables
3.1.1 Are local variables initialized before being used? 3.1.2 Are C++ locals created, then assigned later? This practice has been shown to incur up to 350% overhead, compared to the practice of declaring the variable later in the code, when an initialization variable is known. It is the simple matter of putting a value in once, instead of assigning some default value, then later throwing it away and assigning the real value.
Page 6
3.2
Missing Reinitialization
3.2.1 Can a variable carry an old value forward from one loop iteration to the next? Suppose the processing of a data element in a sequence causes a variable to be set. For example, a file might be read, and some globals initialized for that file. Can those globals be used for the next file in the sequence without being re-initialized?
4 4.1
MACROS If a macro's formal parameter is evaluated more than once, is the macro ever expanded with a actual parameter having side effects? For example, what happens in this code: #define max(a,b) max(i++, j); ( (a) > (b) ? (a) : (b) )
4.2
If a macro is not completely parenthesized, is it ever invoked in a way that will cause unexpected results? #define max(a, b) result = max(i, j) + 3; (a) > (b) ? (a) : (b)
This expands into: result = (i) > (j) ? (i) : (j)+3; See the example in 4.1 for the correct parenthesization.
4.3
If the macro's arguments are not parenthesized, will this ever cause unexpected results? #define IsXBitSet(var) (var && bitmask) result = IsXBitSet( i || j ); This expands into: result = (i || j && bitmask); The correct form is: #define IsXBitSet(var) ((var) && (bitmask)) // not what expected!
Page 7
5 5.1
SIZING OF DATA In a function call with arguments for a buffer and its size, is the argument to sizeof different from the buffer argument? For example: memset(buffer1, 0, sizeof(buffer2)); // danger!
This is not always an error, but it is a dangerous practice. Each instance should be verified as (a) necessary, and (b) correct, and then commented as such.
5.2
Is the argument to sizeof an incorrect type? Common errors: sizeof(ptr) instead of sizeof(*ptr) sizeof(array) sizeof(*array) instead of
sizeof(array) instead of sizeof(array[0]) (when the user wanted the size of an element)
6 6.1
DYNAMIC ALLOCATION Allocating Data
6.1.1 Is too little space being allocated? 6.1.2 Does the code allocate memory and then assume someone else will delete it? This is not always an error, but should always be prominently documented, along with the reason for implementing in this manner. Constructors which allocate, paired with destructors which deallocate, are an obvious exception, since a single object has control of its class data.
6.1.3 Is malloc(), calloc(), or realloc() used in lieu of new? C standard library allocation functions should never be used in C++ programs, since C++ provides an allocation operator.
Page 8
If you find you must mix C allocation with C++ allocation: 6.2.2 Is malloc, calloc, or realloc invoked for an object which has a constructor? Program behavior is undefined if this is done. 6.2 Deallocating Data
6.2.1 Are arrays being deleted as if they were scalars? delete should be delete [] myCharArray; myCharArray;
6.2.2 Does the deleted storage still have pointers to it? It is recommended that pointers are set to NULL following deletion, or to another safe value meaning "uninitialized." This is neither necessary nor recommended within destructors, since the pointer variable itself will cease to exist upon exiting.
6.2.3 Are you deleting already-deleted storage? This is not possible if the code conforms to 6.2.2. The draft C++ standard specifies that it is always safe to delete a NULL pointer, so it is not necessary to check for that value.
If C standard library allocators are used in a C++ program (not recommended):
6.2.4 Is delete invoked on a pointer obtained via malloc, calloc, or realloc?
6.2.5 Is free invoked on a pointer obtained via new? Both of these practices are dangerous. Program behavior is undefined if you do them, and such usage is specifically deprecated by the ANSI draft C++ standard.
Page 9
7 7.1 7.2
POINTERS When dereferenced, can the pointer ever be NULL? When copying the value of a pointer, should it instead allocate a copy of what the first pointer points to?
8 8.1 8.2
CASTING Is NULL cast to the correct type when passed as a function argument? Does the code rely on an implicit type conversion? C++ is somewhat charitable when arguments are passed to functions: if no function is found which exactly matches the types of the arguments supplied, it attempts to apply certain type conversion rules to find a match. While this saves unnecessary casting, if more than one function fits the conversion rules, it will result in a compilation error. Worse, it can cause additions to the type system (either from adding a related class, or from adding an overloaded function) to cause previously working code to break! See the Appendix (A) for an example.
9 9.1
COMPUTATION When testing the value of an assignment or computation, is the parenthesization incorrect? if ( a = function() == 0 ) should be if ( (a = function()) == 0 )
9.2
Can any synchronized values not get updated? Sometimes, a group of variables must be modified as a group to complete a single conceptual "transaction." If this does not occur all in one place, is it guaranteed that all variables get updated if a single value changes? Do all updates occur before any of the values are tested or used?
Page 10
10 10.1
CONDITIONALS Are exact equality tests used on floating point numbers? if ( someVar == 0.1 ) might never be evaluated as true. The constant 0.1 is not exactly representable by any finite binary mantissa and exponent, thus the compiler must round it to some other number. Calculations involving someVar may never result in it taking on that value. Solution: use >, >=, <, or <=, depending on which direction you wish the variable bound.
10.2
Are unsigned values tested greater than or equal to zero? if ( myUnsignedVar >= 0 ) will always evaluate true.
10.3
Are signed variables tested for equality to zero or another constant? if ( mySignedVar ) if ( mySignedVar >= 0 ) if ( mySignedVar <= 0 ) // not always good // better! // opposite case
If the variable is updated by any means other than ++ or --, it may miss the value of the test constant entirely. This can cause subtle and frightening bugs when code executes under conditions that weren't planned for.
10.4
If the test is an error check, could the "error condition" actually be legitimate in some cases?
Page 11
11
FLOW CONTROL
11.1 Control Variables 11.1.1 Is the lower limit an exclusive limit? 11.1.2 Is the upper limit an inclusive limit? By always using inclusive lower limits and exclusive upper limits, a whole class of offby-one errors is eliminated. Furthermore, the following assumptions always apply:
the size of the interval equals the difference of the two limits the limits are equal if the interval is empty the upper limit is never less than the lower limit
Examples: instead of saying x>=23 and x<=42, use x>=23 and x<43. 11.2 Branching 11.2.1 In a switch statement, is any case not terminated with a break statement? When several cases are followed by the same block of code, they may be "stacked" together and the code terminated with a single break. Cases may also be exited via return. All other circumstances requiring "drop through" cases should be clearly documented in a strategic comment before the switch. This should only be used when it makes the code simpler and clearer.
11.2.2 Does the switch statement lack a default branch? There should always be a default branch to handle unexpected cases, even when it appears that the code can never get there.
11.2.3 Does a loop set a boolean flag in order to effect an exit? Consider using break instead. It is likely to simplify the code.
Page 12
11.2.4 Does the loop contain a continue? If the continue occurs in the body of an if conditional, consider replacing it with an else clause if it will simplify the code.
12
ASSIGNMENT
12.1 Assignment operator 12.1.1 Does "a += b" mean something different than "a = a + b"? The programmer should never change the semantics of relationships between operators. For the example here, the two statements above are semantically identical for intrinsic types (even though the code generated might be different), so for a user defined class, they should be semantically identical, too. They may, in fact, be implemented differently (+= should be more efficient).
12.1.2 Is the argument for a copy constructor or assignment operator non-const? 12.1.3 Does the assignment operator fail to test for self-assignment? The code for operator=() should always start out with: if ( this == &right_hand_arg ) return *this;
12.1.4 Does the assignment operator return anything other than a const reference to this? Failure to return a reference to this prevents the user from writing (legal C++): a = b = c; Failure to make the return reference const allows the user to write (illegal C++): (a = b) = c; 12.2 Use of assignment 12.2.1 Can this assignment be replaced with an initialization? (See question 3.1.2 and commentary.)
Page 13
12.2.2 Is there a mismatch between the units of the expression and those of the variable? For example, you might be calculating the number of bytes for an array when the number of elements was requested. If the elements are big (say, a long, or a struct!), you'd be using way too much memory.
13 13.1
ARGUMENT PASSING Are non-intrinsic type arguments passed by value? Foo& do_something( Foo anotherFoo, Bar someThing );
should be Foo& do_something( const Foo& anotherFoo, const Bar& someThing ); While it is cheaper to pass ints, longs, and such by value, passing objects this way incurs significant expense due to the construction of temporary objects. The problem becomes more severe when inheritance is involved. Simulate pass-by-value by passing const references.
14 14.1
RETURN VALUES Is the return value of a function call being stored in a type that is too narrow? (See Appendix (B).) Does a public member function return a non-const reference or pointer to member data? Does a public member function return a non-const reference or pointer to data outside the object? This is permissible provided the data was intended to be shared, and this fact is documented in the source code.
14.2 14.3
14.4 14.5
Does an operator return a reference when it should return an object? Are objects returned by value instead of const references? (See question 13.1 and commentary.)
Page 14
15
FUNCTION CALLS
15,1 Varargs functions (printf, and other functions with ellipsis ...) 15.1.1 Is the FILE argument of fprintf missing? (This happens all the time.) 15.1.2 Are there extra arguments? 15.1.3 Do the argument types explicitly match the conversion specifications in the format string? (printf and friends.) Type checking cannot occur for functions with variable length argument lists. For example, a user was surprised to see nonsensical values when the following code was executed: printf(" %d %ld \n", a_long_int, another_long_int); On that particular system, ints and longs were different sizes (2 and 4 bytes, respectively). printf() is responsible for manually accessing the stack; thus, it saw "%d" and grabbed 2 bytes (an int). It then saw "%ld" and grabbed 4 bytes (a long). The two values printed were the MSW of a_long_int, and the combination of a_long_int's LSW and another_long_int's MSW. Solution: ensure types explicitly match. If necessary, arguments may be cast to smaller sizes (long to int) if the author knows for certain that the smaller type can hold all possible values of the variable. 15.2 General functions 15.2.1 Is this function call correct? That is, should it be a different function with a similar name? (E.g. strchr instead of strrchr?)
15.2.2 Can this function violate the preconditions of a called function?
Page 15
16 16.1
FILES Can a temporary file name not be unique? (This is, surprisingly enough, a common design bug.)
16.2
Is a file pointer reused without closing the previous file? fp = fopen(...); fp = fopen(...);
16.3
Is a file not closed in case of an error return?
Page 16
Appendix
A. Errors due to implicit type conversions.
Code which relies upon implicit type conversions may become broken when new classes or functions are added. For example: class String { public: String( char *arg ); // copy constructor operator const char* () const; // ... }; void foo( const String& aString ); void bar( const char *anArray ); // Now, we added the following class class Word { public: Word( char *arg ); // copy constructor // ... }; // need another foo that works with "Words" void foo( const Word& aWord ); int { gorp() foo("hello"); // This used to work! // Now it breaks! What gives?
String baz = "quux"; bar(baz); // but this still works. } The code worked before class Word and the second foo() were added. Even though there was no foo() accepting an argument of type const char * (i.e. a constant string like "hello"), there is a foo() which takes a constant String argument by reference. And (un)fortunately, there is also a way to convert Strings to char *'s and vice-versa. So the compiler performed the implicit conversion.
Now, with the addition of class Word, and another foo() which works with it, there is a problem. The line which calls foo("hello") matches both: void foo( const String& ); void foo( const Word& ); Since the mechanisms of the failure may be distributed among two or more header files in addition to the implementation file, along with a lot of other code, it may be difficult to find the real problem. The easiest solution is to recognize while coding or inspecting that a function call results in implicit type conversion, and either (a) overload the function to provide an explicitlytyped variant, or (b) explicitly cast the argument. Option (a) is preferred over (b), since (b) defeats automatic type checking. Option (a) can still be implemented very efficiently, simply by writing the new function as a forwarding function and making it inline.
B.
Errors due to loss of "precision" in return values
Functions which can return EOF should not have their return values stored in a char variable. For example: int getchar(void); char chr; while ( (chr = getchar()) != EOF ) ... };
should be: int tmpchar; while ( (tmpchar = getchar()) != EOF ) { chr = (char) tmpchar; // or use casted tmpchar ... // throughout... };
The practice in the top example is unsafe because functions like getchar() may return 257 different values: valid characters with indexes 0 -255, plus EOF (-1). If sizeof(int) > sizeof(char), then information will be lost when the high-order byte(s) are scraped off prior to the test for EOF. This can cause the test to fail. Worse yet, depending on whether char is signed or unsigned by default on the particular compiler and machine being used, signextension can wreak havoc and cause some of these loops never to terminate.
C.
Loop Checklist
The following loops are indexed correctly, and are handy for comparisons when doing inspections. If the actual code doesn't look like one of these, chances are that something is wrong or at least that something could be clearer. Acceptable forms of for loops which avoid off-by-one errors. for ( i = 0; i <= max_index; ++i ) for ( i = 0; i < sizeof(array); ++i ) for ( i = max_index; i >= 0; --i ) for ( i = max_index; i ; --i )
Copyright Notices
1. Some of the questions applicable to conventional C contained herein were modified or taken from A Question Catalog for Code Inspections, Copyright 1992 by Brian Marick. Portions of his document were Copyright 1991 by Motorola, Inc., which graciously granted him rights to those portions. In conformance with his copyright notice, the following contact information is provided below: Brian Marick Testing Foundations 809 Balboa, Champaign, IL 61820 (217) 351-7228 marick@cs.uiuc.edu, marick@testing.com "You may copy or modify this document for personal use, provided you retain the original copyright notice and contact information." 2. Some questions and comment material were modified from Programming in C++, Rules and Recommendations, Copyright 1990-1992 by Ellemtel Telecommunication Systems Laboratories. In conformance with their copyright notice: "Permission is granted to any individual or institution to use, copy, modify, and distribute this document, provided that this complete copyright and permission notice is maintained intact in all copies." 3. Finally, all modifications and remaining original material are: Copyright 1992 by John T. Baldwin. All Rights Reserved. John T. Baldwin 1511 Omie Way Lawrenceville, GA 30243 1-404-3399621 johnb@searchtech.co m
Permission is granted to any institution or individual to copy, modify, distribute, and use this document, provided that the complete copyright, permission, and contact information applicable to all copyright holders specified herein remains intact in all copies of this document.
C - Code Inspection Checklist
Title: Software Engineering in the UNIX/C Environment Author: William Bruce Frakes, Brian A. Nejmeh, Christopher John Fox Publisher: Prentice Hall Date Published: February 1991
Data-Declaration Errors
1. 2. 3. 4. 5. 6. 7. 8. Is each data structure correctly typed? Is each data structure properly initialized? Are descriptive data structure names used? Could global data structures be made local? Have all data structures been explicitly declared? Is the initialization of a data structure consistent with its type? Are there data structures that should be defined in a type definition? Are there data structures with confusingly similar names?
Data-Reference Errors
1. 2. 3. 4. 5. 6. Is a variable referenced whose value is uninitialized or not set to its proper value? For all array references, is each subscript value within the defined bounds? For pointer references, is the correct level of indirection used? For all references through pointers, is the referenced storage currently allocated to the proper data? Are all defined variables used? Is the #define construct used when appropriate instead of having hard-wired constants in functions?
Computation Errors
Are there missing validity tests (e.g., is the denominator not too close to zero)? Is the correct data being operated on in each statement? Are there any computations involving variables having inconsistent data types? Is overflow or underflow possible during a computation? For expressions containing more than one operator, are the assumptions about order of evaluation and precedence correct? Are parentheses used to avoid ambiguity? 6. Do all variables used in a computation contain the correct values? 1. 2. 3. 4. 5.
Comparison Errors
1. 2. 3. 4. 5. 6. 7. 8. Is the "=" expression used in a comparison instead of "=="? Is the correct condition checked (e.g., if (FOUND) instead of if (!FOUND) )? Is the correct variable used for the test (e.g., X == TRUE instead of FOUND == TRUE )? Are there any comparisons between variables of inconsistent types? Are the comparison operators correct? Is each boolean expression correct? Are there improper and unnoticed side-effects of a comparison? Has an "&" inadvertently been interchanged with a "&&" or a "|" for a "||"?
Control-Flow Errors
Are null bodied if , else , or other control structure constructs correct? Will all loops terminate? Is there any unreachable code? Do the most frequently occurring cases in a switch statement appear as the earliest cases? Is the most frequently exercised branch of an if - else statement the if statement? Are there any unnecessary branches? Are if statements at the same level when they do not need to be (e.g., the can be nested within each other)? 8. Are goto 's avoided? 9. Are out-of-boundary conditions properly handled? 10. When there are multiple exits from a loop, is each exit necessary? If so, is each exit handled properly? 11. Is the nesting of loops and branches correct? 12. Are all loop terminations correct? 13. Does each switch statement have a default case? 14. Are there switch case 's missing break statements? If so, are these marked with comments? 15. Does the function eventually terminate? 16. Is it possible that a loop or condition will never be executed (e.g., flag = FALSE; if ( flag == TRUE ) )? 17. For a loop controlled by iteration and a boolean expression (e.g., a searching loop), what are the consequences of falling through the loop? For example, consider the following while statement: while ( !found && (i < LIST_SIZE) ) What happens if found becomes TRUE ? Can found become TRUE ? 18. Are there any "off by one" errors (e.g., one too many or too few iterations)? 19. Are there any "dangling elses" in the function (recall that an else is always associated with the closest unmatched if )? 20. Are statement lists properly enclosed in { } ? 1. 2. 3. 4. 5. 6. 7.
Input-Output Errors
1. 2. 3. 4. 5. 6. Have all files been opened before use? Are the attributes of the open statement consistent with the use of the file (e.g., read, write)? Have all files been closed after use? Is buffered data flushed? Are there spelling or grammatical errors in any text printed or displayed by the function? Are error conditions checked?
Interface Errors
1. 2. 3. 4. 5. Are the number, order, types, and values of parameters received by a function correct? Do the values in units agree (e.g., inches versus yards)? Are all output variables assigned values? Are call by reference and call by value parameters used properly? If a parameter is passed by reference, does its value get changed by the function called? If so, is this correct?
Comment Errors
1. 2. 3. 4. 5. Is the underlying behavior of the function expressed in plain language? Is the interface specification of the function consistent with the behavior of the function? Do the comments and code agree? Do the comments help in understanding the code? Are useful comments associated with each block of code?
6. Are there enough comments in the code? 7. Are there too many comments in the code?
Modularity Errors
1. Can the underlying behavior of the function be expressed in plain English? 2. Is there a low level of coupling among functions (e.g., is the degree of dependency on other functions low)? 3. Is there a high level of cohesion among functions (e.g., is there a strong relationship among functions in a module)? 4. Is there repetitive code (common code) throughout the function(s) that can be replaced by a call to a common function that provides the behavior of the repetitive code? 5. Are library functions used where and when appropriate?
Storage Usage Errors

1. Is statically allocated memory large enough (e.g., is the dimension of an array large enough)? 2. Is dynamically allocated memory large enough? 3. Is memory being freed when appropriate?
Performance Errors
1. Are frequently used variables declared with the register construct? 2. Can the cost of recomputing a value be reduced by computing the function once and storing the results? 3. Can a more concise storage representation be used? 4. Can a computation be moved outside a loop without affecting the behavior of the loop? 5. Are there tests within a loop that do not need to be done? 6. Can a short loop be unrolled? 7. Are there two loops that operate on the same data that can be combined into one loop? 8. Is there a way of exploiting algebraic rules to reduce the cost of evaluating a logical expression? 9. Are the logical tests arranged such that the often successful and inexpensive tests precede the more expensive and less frequently successful tests? 10. Is code written unclearly for the sake of "efficiency"?
Maintenance Errors
1. Is the function well enough documented that someone other than the implementer could confidently change the function? 2. Are any expected changes missing? 3. Do the expected changes require a considerable amount of change to the function? 4. Is the style of the function consistent with the coding standards of the project?
Traceability Errors
1. Has the entire design allocated to the function(s) been satisfactorily implemented? 2. Has additional functionality beyond that specified in the design of the function(s) been implemented?
Software Requirements Specification (SRS) Checklist (Alfred Hussein, University of Calgary)
1. Is the SRS correct? Each requirement in the SRS is free from error. This requires that each requirement statement accurately represent the functionality required of the system to be built. While the other categories within this checklist result in errors, this category of errors is concerned with the technical nature of the application at hand. In other words, the requirement is just plain wrong. For example, if the problem domain states that the XYZ system is to provide a response to input within 5 seconds and the SRS requirement specifies that the XYZ system will respond within 10 seconds, the requirement is in error. 2. Are the requirements in the SRS unambiguous, precise, and clear? Each requirement in the SRS is exact and unambiguous, there is one and only one interpretation for every requirement. The meaning of each requirement is easily understood and is easy to read. Requirement statements should be short, explicit, precise and clear. Rambling verbose descriptions are generally subject to interpretation. Each requirement should have a specific purpose and represent a specific characteristic of the problem domain. When writing the requirement, try to imagine that the requirement is to be given to ten people who are asked for their interpretation. If there is more than one such interpretation, then the requirement is probably ambiguous. It is also important to remain objective when writing requirements, never assume that everyone will understand the requirement the way you understand it. At a minimum, all terms that could have multiple meanings should be defined in a glossary where its meaning is made more specific. The difficulty of ambiguity stems from the use of natural language which in itself is inherently ambiguous. It is recommended that checklists of words and grammatical constructs that are particularly prone to ambiguity be used to aid in identifying possible errors of this type. Refer to Appendix B for a checklist of potential ambiguity indicators.
3. Is the SRS complete? An SRS is considered complete if all requirements that are needed for the specification of the requirements of the problem have been defined and the SRS is complete as a document. The following qualities indicate a complete SRS:
Everything the software is supposed to do is included in the SRS. This includes functionality, performance, design constraints, or external interfaces. This is a difficult quality to judge because it requires a complete understanding of the problem domain. It is further complicated by the implication that something is missing in the SRS, and it is not a simple process to find something that is not present by examining what is present. Definitions of the responses of the software to all realizable classes of input data in all realizable classes of situations is included. This includes the responses to both valid and invalid input. This implies that for every input mentioned in the SRS, the SRS specifies what the appropriate output will be. Conformity to the applicable SRS standard. If a particular section is not applicable, the SRS should include the section number and an explanation of why it is not applicable. All pages are numbered; all figures and tables are numbered, named and referenced; all terms and units of measure are provided; and all referenced material, and sections are present. This is completeness from a word-processing perspective. No sections are marked "To Be Determined (TBD)". The use of "TBD" in a section of the SRS should be avoided whenever possible. If this is unavoidable, each TBD should be appended with a notation that explains why it can't be completed, what needs to be done to complete the section, who will complete the section and when it will be completed. This ensures that the TBD is not interpreted as an excuse to delay completion of the SRS indefinitely. By including the name of the responsible individual and the date, we insure that the TBD expires at some point.
4. Is the SRS verifiable or testable? An SRS is verifiable if and only if every requirement statement is verifiable. A requirement is verifiable if and only if there is some finite cost-effective way in which a person or machine can check to see if the software product meets the requirements. There should be some quantitative way to test the requirement. When looking at the testability of a requirement consider if it can be tested by actual test cases, analysis or inspection. There are a number of reasons why a requirement may not be verifiable:
The requirement is ambiguous. This leads to non-verifiability, there is no way to verify software that exhibits ambiguous traits. Using non-measurable quantities such as "usually" or "often" implies the absence of a finite test process. Refer to Appendix C for additional nonquantifiable measures.
5. Is the SRS consistent? An SRS is consistent if and only if the stated requirements do not conflict with other stated requirements within the SRS. Inconsistency can manifest itself in a number of ways.
Conflicting terms: Two terms are used in different contexts to mean the same thing. The term prompt to denote a message to have a user input data is used in one requirement while the term cue is used by another requirement to mean the same thing. Conflicting characteristics: Two requirements in the SRS demand the software to exhibit contradictory attributes. For example, one requirement states all inputs shall be via a menu interface while another states all inputs shall be via a command language. Temporal or logical inconsistency: Two parts of the SRS might specify conflicting timing characteristics or logic. For example, one requirement may state that system A will occur only while system B is running and another requirement may conflict by stating that system A will occur 15 seconds after the start of system B.
A logic inconsistency may be one requirement stating that the software will multiply the user inputs, another requirement may state that the software will add the user inputs. Inconsistency can also occur between the requirements and their source. It is important to insure that the terminology used is the same as the terminology used in the source documents. Using the same vocabulary as the source documents works towards solving this form of inconsistency. 6. Does the SRS deal only with the problem? The SRS is a statement of the requirements that must be satisfied by the problem and they are not obscured by design detail. Requirements should state "what" is required at the appropriate system level, not "how". In some cases, a requirement may dictate how a task is to be accomplished, but this is always a customer prerogative.
Requirements must be detailed enough to specify "what" is required, yet provide sufficient freedom so design is not constrained. Avoid telling the designer "how" to do this job, instead state "what" has to be accomplished. 7. Is each requirement in the SRS feasible? Each requirement in the SRS can be implemented with the techniques, tools, resources and personnel that are available within the specified cost and schedule constraints. Set performance bounds based on system requirements and not state-of-the-art capability. Specify what functions and performance attributes the user requires and yet do not commit to any particular physical solution. Requirements should not be technologyoriented. If growth capability is to be included, state and place bounds on it. Vague growth statements are generally useless. The level to which a "hook" is to be designed into the system should be clearly identified. 8. Is the SRS modifiable? An SRS is modifiable if its structure and style are such that any necessary changes to the requirements can be made easily, completely, and consistently. This attribute of an SRS is more concerned with the format and style of the SRS as opposed to the actual requirements themselves. Modifiability of an SRS requires the following:
The SRS has a coherent and easy-to-use organization, with a table of contents, an index, and cross-references if necessary. This will allow easy modification of the SRS in future updates. The SRS should have no redundancy, that is, a requirement should not be in more than one place in the SRS.
Redundancy itself is not an error, it is a technique, which can be used to improve readability, but it also can reduce modifiability. The difficulty here is that a requirement may be repeated in another requirement to make the second requirement clearer. If the first requirement needs to be changed, there are now two locations to change. If the change is made in only the first location, the SRS can become inconsistent. Though a cross-reference can help to alleviate this problem, redundancy should not be used unless absolutely necessary. If references to a previous requirement are necessary, do so by paragraph reference. 9. Is the SRS traceable? An SRS is traceable if the origin of each of its requirements is clear and if it facilitates the referencing of each requirement in future development or enhancement documentation.
Each requirement should be contained in a single, numbered paragraph so that it may be referred to in other documents. This will facilitate backward traceability to source documents and forward traceability to software design documents and test cases. Preferably the requirement should be titled as well. Grouping of several requirements in a single paragraph should be avoided, unless interrelated requirements cannot be separated and still provide clarity. There are two types of traceability:

Backward traceability implies that we know why every requirement in the SRS exists. Each requirement explicitly references its source in previous documents. Forward traceability implies that all documents that follow the SRS are able to reference the requirements. This is done by each requirement in the SRS having a unique name or reference number.
10. Does the SRS specify design goals? Design goals should be used with discretion. All requirements must be met even if the design goal is not satisfied. Quantitative design goals must have a requirement associated with them. 11. Does the SRS use the appropriate specification language? The use of "shall" statements is encouraged, as this implies a directive to express what is mandatory. The use of "shall" requires a sense of commitment. Unwillingness to make such a commitment may indicate that the requirement, as currently stated, is too vague or is stating "how" and not "what", etc. "Will" statements imply a desire, wish, intent or declaration of purpose. "Should" or "may" are used to express non-mandatory provisions. It is recommended that "will, should and may" not be used in writing requirements unless absolutely necessary.
APPENDIX A
Software Requirements Specification (SRS) Summary Checklist 1. 2. 3. 4. 5. 6. 7. 8. 9. Is the SRS correct? Are the requirements in the SRS nonambiguous, precise, and clear? Is the SRS complete? Is the SRS verifiable or testable? Is the SRS consistent? Does the SRS deal only with the problem? Is each requirement in the SRS feasible? Is the SRS modifiable? Is the SRS traceable?
10. Does the SRS specify design goals? 11. Does the SRS use the appropriate specification language?
APPENDIX B.
Checklist of Words and Grammatical Constructs Prone to Ambiguity

Incomplete lists, typically ending with "etc.", "and/or", and "TBD". Vague words and phrases, such as "generally", "normally", "to the greatest extent", and "where practicable". Imprecise verbs, such as "supported", "handled", "processed", or "rejected". Implied certainty, such as "always", "never", "all", or "every". Passive voice, such as "the counter is set." (by whom?) Every pronoun, particularly "it" or "its" should have an explicit and unmistakable reference. Comparatives, such as "earliest", "latest", "highest". Words ending in "est" or "er" should be suspect. Words whose meanings are subject to different interpretations between the customer and contractor such as: o Instantaneous o Simultaneous o Achievable o Complete o Finish o Degraded o A minimum number of o Nominal/normal/average o Peak/minimum/steady state o As required/specified/indicated o Coincident/adjacent/synchronous with.
APPENDIX C
Nonquantifiable Measures These words signify non-quantifiable measures that can indicate that a requirement can not be verified or tested:

Flexible Modular Efficient Adequate Accomplish Possible (possibly/correct(ly)) Minimum required/acceptable/reasonable Better/higher/faster/less/slower/infrequent
Some/worst Usually/often To the extent specified To the extent required To be compatible/associated with
NOTES ON THE SOFTWARE INSPECTION:
Overview and Frequently Asked Questions
Prepared by Craig K. Tyran
Overview Software plays a major role in modern organizations, as it is used to run the computerbased systems which collect, store, transform, and organize the data used to conduct business. Unfortunately, the development of software is a major headache for organizations. Frequently software is delivered late and does not meet user requirements due to defects. Many software problems may be attributed to the development of low quality software that is characterized by numerous defects. It has been suggested by software experts that the way to address these types of software problems is to improve software quality through quality assurance methods. Quality improvements will reduce defects and thus speed up development time (due to less rework required prior to release) and decrease the costs and personnel demands associated with program maintenance (due to fewer defects in the released software). One of the most well known software quality techniques is the software inspection. A software inspection is a group meeting which is conducted to uncover defects in a software "work product" (e.g., requirements specification, user interface design, code, test plan). The software inspection approach is a formally defined process involving a series of well defined inspection steps and roles, a checklist to aid error detection, and the formal collection of process and product data. Software inspections are conducted in industry because they have been found to be an effective way to uncover defects. The detection rate for an inspection varies depending on the type of work product being inspected and the specific inspection process used. Studies have found that 30 to 90 percent of the defects in a work product may be uncovered through the inspection process. Early detection of defects can lead to cost savings. For example, one study has estimated that inspection-based techniques at Hewlett-Packard have yielded a cost saving of $21 million. It should be noted that inspections may take up a significant portion of a project's time and budget if performed on a consistent basis throughout the life of a project. According to an industry estimate from AT&T, project teams may devote four to fifteen percent of project time to the inspection process. While allocating this amount of time on inspections may seem high, the benefit of reducing software defects has been found to outweigh the cost of conducting inspections.
The favorable attitude that many in the software development field have toward the inspection process is underscored by a statement made by Barry Boehm (a well known expert in the field of systems development), who has written that "the [software inspection] has been the most cost effective technique to date for eliminating software errors."
Software Inspections: Frequently Asked Questions What is a software inspection?
A group review of a software "work product" for the purpose of identifying defects or deficiencies.
What is meant by a "work product"?
A work product refers to the models, designs, programs, test plans, etc. that are generated during the systems development process. Examples of work products corresponding to some of the different system life cycle phases include: o Analysis phase: DFDs, ER diagrams, project dictionary entries, process specifications o Design phase: input/output forms, reports, database design, structure charts o Implementation phase: program code, test plan for code
What is meant by "defects and deficiencies"?

o
Anything that may ultimately hinder the quality of the final software product is considered a defect. Defects may happen throughout the development process. For example, errors regarding DFDs and the project dictionary may occur during the analysis phase; poor user interfaces may be generated during the design phase; and "buggy" code and incomplete testing plans may be generated during the implementation phase.
Why perform software inspections?
Inspections are justified on an economic basis. The earlier that a software defect can be identified, the cheaper it is to fix. For example, studies show that a defect that is found during system testing can be 10 to 100 times more expensive to fix than defects found early in the development life cycle (e.g., analysis and design).
Data collected during the inspection process (e.g., the types and rates of defects) can be used to help improve the software development process. For example, if software developers in an organization tend to make the same types of errors on multiple projects, the organization may wish to understand the root cause for these problems and take action to improve the development process. Inspections can help to improve communication and learning o Among IS personnel o Between IS and user personnel
What is the "output" of an inspection review?
An "Action List" of the errors/deficiencies that need to be fixed. This list is then passed onto the person who produced the work product.
Does a software inspection entail error detection AND error correction, or just error detection?
Only error detection. Error correction is delegated to the person who produced the work product.
Are inspections typically used for purposes of performance appraisal?
No. Organizations that conduct inspections have found that it is best to use the inspection process to improve the quality of a work product and not use the inspection for performance appraisal. They have found that if inspections are used to formally track performance of IS personnel for performance appraisals, then the inspections become too much of a high-stress activity and some personnel do not want to participate.
What is a good attitude for inspectors to have during an inspection?

Consider the work product guilty until proven innocent. However, the producer is always innocent (i.e., focus on the product and not the person who developed the product).
When should an inspection be conducted?
When a unit of work/documentation is completed and is still in a "bite size" piece, then an inspection should be scheduled. For example, during analysis, it may be appropriate to conduct an inspection once the first draft of DFDs for a subset of the project have been completed. During programming, an inspection of code could be conducted once the program for a new module has been coded. Conduct inspections frequently to find errors as early as possible.
Who participates in an inspection?
IS personnel and users may be involved. The selection of participants depends on the purpose of the inspection. For example, code inspections typically involve only IS personnel. However, an inspection concerning analysis and design work products may involve the users. In the latter case, prior to a review with the user, there may be an internal MIS review involving only IS personnel where the work product is checked to make sure it is accurate and clear. Once the work product is "in shape" to be viewed by users, an external review and inspection involving both IS and user personnel may take place (to determine if the system supports the user needs).
What may be some of the potential problems that happen during inspection meetings?
Group inspection meetings may be unproductive due to sidetracking and domination by certain group members. Also, if inspectors are not "polite," inspections may lead to excessive criticism and conflict. Also, there is the chance that inspectors can get bogged down on trying to correct problems.
How can these potential problems be addressed?
To help address some of the potential problems with group inspections, successful inspection teams follow a set of guidelines/ground rules. Specific guidelines include:
The number of participants should be manageable - Range: 3-6 people
Emphasize error detection, not correction Participants should be "charitable critics" (don't be nasty)
Follow organizational standards (e.g., diagramming/naming conventions, etc.) to reduce misunderstandings or disagreements. Recognize that there may be some "open issues" that can not be resolved at the inspection meeting (e.g., when inspecting an analysis documents, there may be some questions that need to be referred to the user) Keep length of inspection to be less than two hours to reduce fatigue factor.
Sources: Ackerman, A., et al. "Software inspections: An effective verification process," IEEE Software, May 1989, 31-36. Boehm, V. "Indutrial sofware metrics top ten list," IEEE Software, September 1987, 8485. Ebenau, R. and Strauss, S. Software Inspection ProcessMcGraw-Hill, New York, New York, 1994. Gilb, T. and Graham, D. Software InspectionAddison-Wesley, Wokingham, England, , 1993. Grady, R. and Van Slack, T. "Key lessons in achieving widespread inspection use," IEEE Software, 11(4), July 1994.
************************* T H E E N D *************************
240

DBMS

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

DBMS

Transféré par

Droits d'auteur :

Formats disponibles

Database Management systems

1.2 Advantages of Database Systems

DESIGNATION Owner Accountant ... ...

Nanubhai Rajesh Munim ... ...

ORDERS ORDER NO. 3216 ... ... ...

PARTS DESC Amkette 3.5" Floppies ... ... ...

PART PRICE 400.00 ... ... ...

QUANTITY 300 120 ... ...

Cust ID 10001 10002 10003 10004 ... ...

Cust Name Raj ... ... ... ... ...

Cust City Delhi ... ... ... ... ...

... ... ... ... ... ... ...

Term Relation Tuple Attribute Cardinality of a relation Degree of a relation A table

Primary Key of a relation

Ord# 101 104

OrdDate 02-08-94 18-09-94

Cust# 002 002

2. PROJECT To retrieve specific attributes/columns from a relation.

Descr Power Supply 101-Keyboard Mouse MS-DOS 6.0 MS-Word 6.0

Price 4000 2000 800 5000 8000

3. PRODUCT To obtain all possible combination of tuples from two relations.

Ord# 101 101 101 101 101 102 102

OrdDate 02-08-94 02-08-94 02-08-94 02-08-94 02-08-94 11-08-94 11-08-94

O.Cust# 002 002 002 002 002 003 003

C.Cust# 001 002 003 004 005 001 002

CustName Shah Srinivasan Gupta Banerjee Apte Shah Srinivasan

City Bombay Madras Delhi Calcutta Bombay Bombay Madras

102 103 104 105

11-08-94 21-08-94 28-08-94 30-08-94

003 003 002 005

Gupta Gupta Srinivasan Apte

Delhi Delhi Madras Bombay

29. Oct. 1998 1. Jan 1999

X103 X104 NULL

Sharma Vani NULL

Nagpur Bhopal NULL

NULL 120003 120004

NULL 1. Jan 1999 4. Mar. 1999

NULL 3000 500

Q: What will the result of a natural join operation between R1 and R2 ? A: a1 a2 a3 b1 b2 b3 c1 c2 c3

8. DIVIDE Consider the following three relations:

Result Query 11:

Records in the result set is displayed in the ascending order of custname

Result Query 15:

Result Query 16: SELECT descr, price

SELECT statement implementing JOIN operation.

MOD(n,m) SQRT(n) ROUND(n,m) TRUNC(n,m)

Date + No. of days Date - No. of days Date Date

Character Expressions & Functions || - Concatenate operator

INSERT INTO items (item#, descr) <---------VALUES ('HW4', '132-DMPrinter');

Query: ALTER TABLE customers ADD (phone number(8), <-----------------credit_rating char(1));

Creates an index based on two fields : city and custname

Drops index i_city

REVOKE DELETE <------------ON customers FROM ashraf; 5. Recovery and Concurrency

Takes away DELETE permission on customers table from user 'ashraf'.

EXEC SQL EXEC SQL

EXEC SQL UNDO: EXEC SQL FINISH: RETURN;

The following figure shows the Lock Compatibility matrix.

Query Evaluation Method 2

(ORD_ITEMS) (Perform the Select operation on ORD_ITEMS as the first

Condition Existing Restriction attribute is indexed

Restriction Restriction Join Join Projection Projection

Restriction attribute is hashed Restriction attribute is neither indexed nor hashed

Object Oriented Concepts