Vous êtes sur la page 1sur 9

Architecture, Evolution, and Cloning of Linux Device Drivers: A Case Study

Davor Svetinovic and Michael Godfrey Software Architecture Group (SWAG) Department of Computer Science, University of Waterloo Waterloo, Ontario, N2L 3G1 email: {dsvetino,migod}@swag.uwaterloo.ca
Abstract
In last few years, a lot of research has been done in the areas of software architecture, evolution, and cloning. All these areas have been studied separately, and case studies and experiments have been performed on large systems. The target of our study, in these areas, have been device drivers, as a part of a large system - Linux operating system. The analysis of many relatively small and separate systems (device drivers), within the context of a large system (operating system), has given us new results on dependencies among and within these areas, and how they aect each other. In this paper, we present results obtained in these areas, together with relationships and dependencies among them. All the results are obtained using widely available reverse engineering tools, which provide the picture of their usability in an industrial setting.We rst present results obtained within all three areas separatively, together with the evaluation of the tools and techniques used to obtain them, and then how these results t together. Keywords: Software architecture, software evolution, reverse engineering, code duplication detection tem in general, and its SCSI subsystem in particular as a typical representative. We have rst analyzed architecture and evolution of SCSI subsystem. During the study of subsystems evolution, we have discovered that the main reason for the large increase in code body, of the subsystem, is due to intentional code duplication (cloning). This has lead to the investigation of the cloning within the subsystem. This investigation has lead to the result that the architecture of the subsystem directly encourages cloning. This dependency lead to the discovery of other interesting results, which we will present later. The paper is organized as follows. In the rst part, we discuss related work in software architecture, evolution, and cloning. In the second part, we discuss main aspects of the architecture and evolution of the Linux kernel. Then, we present the results obtained in our study of the drivers subsystem and discuss performance of the tool used for clone detection related to our manual clone detection. Lastly, we discuss relationships among the results obtained in all three areas, and present future work.

2
2.1

Related Work
Software Architecture

Introduction

The work and results presented in this paper are the continuation of work presented in [7, 2]. These papers described the architecture and evolution of Linux operating system kernel. The focus of both papers were on the issues down to the level of major subsystems. The architecture and evolution within these subsystems were not analyzed. Especially interesting results were discovered about device drivers subsystem [7]. This is why we have decided to analyze drivers subsys-

In last few years, a lot of research has been done in the area of software architecture. The eects of the architecture on the quality, understandability, and maintainability of the system have been widely recognized and a lot of eort has been put in improving architectures of both, new and legacy systems. Basic theory on software architecture is presented in [15, 18]. In this paper, we present reverse engineering of the conceptual architecture of the subsystem.

2.2

Software Evolution

Even though the concept of the software evolution is older than that of software architecture, software evolution is less researched and we have smaller body of knowledge about it. Lehman et al. have built the largest and best known body of research on the evolution of large, long-lived software systems [11, 12, 13, 19]. Lehmans laws of software evolution [13], which are based on his case studies of several large software systems, suggest that as systems grow in size, it becomes increasingly dicult to add new code unless explicit steps are taken to reorganize the overall design. Turskis statistical analysis of these case studies suggests that system growth (measured in terms of numbers of source modules and number of modules changed) is usually sub-linear, slowing down as the system gets larger and more complex [12, 19]. Gall et al. examined the evolution of a large telecom switching system both at the system level and within the top-level subsystems [5], much as it was done with Linux [7]. They noted that while the system-level evolution of the system seems to conform to the traditionally observed trend of reduced change rates over time, they noted that the major subsystems may behave quite dierently from the system as a whole. In their case study they found that some of the major subsystems exhibited interesting evolutionary behaviors, but that these behaviors cancelled each other out when the full system was viewed at the top level. They argue that it is not enough, therefore, to consider evolution from the topmost level; one must also be concerned about the individual parts as well. Investigations in [7] strongly support this view, too. This is one of the reasons why we have done the work presented in this paper. Kemerer and Slaughter have presented an excellent survey of research on software evolution [8]. They also note that there has been relatively little research on empirical studies of software evolution. Parnas has used the metaphor of decay to describe how and why software becomes increasingly brittle over time [14]. Eick et al. extend the ideas suggested by Parnas by characterizing software decay in ways that can be detected and measured [6]. They used a large telephone switching system as a case study. They suggest, for example, that if it is common for defect xes to require changes to large numbers of source les, then the software system is probably poorly designed. Their metrics are predicated on the availability of detailed defect tracking logs that allow, for example, a user to determine how many defects have resulted in modications to a particular module.

Perry presented evidence that the evolution of a software system depends not only on its size and age but also on factors such as the nature of the system itself (i.e., its application domain), previous experience with the system, and the processes, technologies, and organizational frameworks employed by the company that developed the software [16]. In this paper, we discuss how architecture and type of drivers subsystem aect its evolution.

2.3

Cloning

Considered to be one of the worst programming practices because of its eects during software maintenance , code duplication problem has attracted many researches, who have done extensive work on techniques for automatic clone detection and removal. There are many reasons why programmers introduce duplicated code [1, 10]. The main reasons are: It is easier and faster to copy and paste code than to write it from scratch. It is often assumed that code that one duplicates is already tested, works, and that the new code would introduce new bugs. Programmers are often evaluated based on the amount of code they produce. Sometimes it is more ecient to copy code than to introduce additional method invocations. The general belief that all the forms of reuse are good. The main problems that appear due to the duplicated code are: If the code, that is duplicated, contains bugs, then they are propagated throughout the system. It is well known how much resources it takes to discover bugs during the maintenance, so this is a severe problem. Size of the code body is increased by the code duplication, which increases compilation time and size of the executable. Size of the code body makes it more dicult to understand the code. Any benets introduced by the code duplication are by far outweighed by the problems it introduces. Therefore, techniques for clone management, detection, and removal are needed. Bad design.

Linux Architecture and Evolution

init contains the initialization code for the kernel. ipc contains the code for inter-process communications. kernel contains the main kernel code that is architecture independent. lib contains the (architecture independent) library code. mm contains the (architecture independent) memory management code. The most signicant results about the evolution of Linux, presented in [7], are: Super-linear growth at system level, which is surprising given that (a) it has large size, (b) it is built using open-source development model (a highly collaborative and geographically distributed set of developers, many of whom contributing their time and eort for free), and (c) previously published research that suggests that the growth of large software systems tends to slow down as the systems become larger [5, 12, 19]. black box examination is not enough; one must investigate the nature of the subsystems and explore their evolutionary patterns to gain an understanding of how and why the system as a whole has evolved. More than half of the code consists of device drivers, which are relatively independent of each other and have the largest inuence on the increase in the size of the system. Large part of the system consists of parallel features that are specic to particular CPUs. The addition of these features has produced several jumps in the size of the system. The core kernel subsystems comprise only a small part of the full source tree, and do not eect growth of the system signicantly.

Linux is a Unix-like operating system originally written by Linus Torvalds, but subsequently worked on by hundreds of other developers [3]. It was originally written to run on an Intel 386 architecture, but has since been ported to numerous other platforms, including the PowerPC, the DEC Alpha, the Sun SPARC and SPARC64, and even mainframes and PDAs. The rst ocial release of the kernel, version 1.0, occurred in March 1994. This release contained 487 source code les comprising over 165,000 lines of code (including comment and blanks lines). Since then, the Linux kernel has been maintained along two parallel paths: a development release containing experimental and relatively untested code, and a stable release containing mostly updates and bug xes relative to the previous stable release. By convention, the middle number in a kernel version identies to which path it belongs: an odd number (e.g., 1.3.49) denotes a development kernel, and an even number (e.g., 2.0.7) denotes a stable kernel. At the time of writing (April 2001), the most recent stable kernel is version 2.4.3. This release contains 7108 source code les comprising 3,143,627 lines of code (including comment and blanks lines) - 2,201,471 lines of code and 729,457 lines of comment. Linux system consists of 10 major subsystems [17]. These subsystems are clearly shown through the source code directory. This is the architecture presented through the development view [9]. This is the view of interest to us since we are studying the architectural evolution at the code level. These subsystems are: drivers contains a large collection drivers for various hardware devices. arch contains the kernel code that is specic to particular hardware architectures/CPUs, including support for memory management and libraries. include contains most of the systems include (doth) les. net contains the main networking code such as sup-port for sockets and TCP/IP (code for particular net-working cards is contained in the drivers/net sub-system). fs contains support for various kinds of le systems.

SCSI Subsystem

The drivers subsystem consists of 11 subsystems. We have decided to study SCSI subsystem, as a representative, since it is large compared to the other drivers subsystems and exhibits common properties to all the other drivers subsystems [7].

Small Computer Systems Interface (SCSI) is a standard for connecting devices to a computer via standard hardware interface. SCSI supports almost all device types (disks, scanners, etc.). It has many advantages over IDEATA, like reliability and speed.

it and the waits for the command to nish. Very often, the command is passed to the function queuecommand. int queuecommand (Scsi Cmnd , void ( done)(Scsi Cmnd )) - This function executes the command encoded in the Scsi Cmnd structure, but the system does not wait for it to nish. When the command is done, driver sends a message to the system. int abort (Scsi Cmnd ) - This function is used to abort the command if the system gures out that the command does not perform as it should (for example, time period in which command should be nished has passed). int reset (Scsi Cmnd , unsigned int) - This function is used to reset the scsi bus if the error can not be handled by the abort function. int bios param (Disk , kdev t, int []) - This function determines the bios parameters for a hard disk. Beside main functions that are implemented, there are some locally visible functions and data elds that provide support for main functionality and handle driver specics (get command, show command, etc.). These functions are used exclusively at the low level since they are not part of the interface towards the rest of the system. None of these functions should be used directly by the code outside of SCSI subsystem, but through the Upper Layer (explained next). Up to now, we have presented low level structure of a driver and basic data structures. At the higher level of abstraction, there are three distinct layers: Upper Layer is responsible for converting requests made to the SCSI subsystem and passing them through Middle Layer to the actual driver that executes commands. It has four subsystems depending on the type of the device. Disk subsystem is responsible for hard disks, CD-ROM Disk subsystem is responsible for CD-ROMs, Tape for tapes and Generic for other devices like scanners. All the requests to the devices should go through this layer. Middle Layer contains data structures and other functionality that provide a bridge between Upper Layer and Low Level Drivers. Previously mentioned data structures belong to this layer. Low Level Drivers are responsible for initialization and interaction with actual devices and execution of commands received from Middle Layer. It is partitioned into subsystems depending on which family of products driver supports. All the drivers implement subset of interface imposed by the upper layers.

4.1

Architecture

There are three important structures that are used in the subsystem Scsi Host, Scsi Host Template and scsi cmnd. Scsi Host is used to represent the host adapter and its state. Scsi Host Template is used by upper levels of the subsystem to access the adapter in a uniform way, not depending on its type. There is only one instance of it for every host device type in the system. Scsi cmnd structure is used to represent a command that is executed by device driver. Upper levels of subsystem represent the command as the scsi cmnd structure and pass it through Scsi Host Template structure to the command function of the device driver that then executes it. Quasi object oriented techniques are used to implement these structures. Pointers to functions are used to represent methods. Visibility is enforced by private keyword under comments (even though a compiler can not enforce it, it is supposed to be respected by programmers who are accessing these elds). It is also mentioned what should be accessible to what layer of the subsystem, and it is supposed to be respected. It would be interesting to see how much this is actually respected. The functions, a driver must implement, are dened in Scsi Host Template interface. The main ones, which all drivers must implement, are: int proc info (char , char , o t, int, int, int) - This function is used to export driver statistics and other information to the world outside the kernel. It is also used to ll the driver with information. int detect (struct SHT ) - This function is used to initialize all data necessary for the SCSI driver. It scans for the SCSI host and registers it with the kernel. When it is done, host is issued commands to nd all the SCSI devices attached. const char info (struct Scsi Host ) - This function is used to return the information about the adapter. What kind and the amount of the information returned is specied by the programmer. int command (Scsi Cmnd ) - This function is accepts the command encoded in the Scsi Cmnd structure by upper levels of the system, executes

Upper Layer Disk sd.c sd.h sd_ioctl.c CD-ROM Disk sr.c sr.h sr_ioctl.c sr_vendor.c Generic sg.c sg.h Tape st.c st.h st_options.h

Middle Layer constants.c scsi.h scsi_ioctl.c scsi_proc.c constants.h scsi.h scsi_ioctl.h scsi_queue.c hosts.c scsi_debug.c scsi_module.c scsi_syms.c hosts.h scsi_debug.h scsi_obsolete.c scsicam.c scsi.c scsi_error.c scsi_obsolete.h scsicam.h

Low Level Drivers 3ware 3w-xxxx.c 3w-xxxx.h Amiga Technolog. amiga7xx.c amiga7xx.h Atari atari_scsi.c atari_scsi.h MAC mac53c94.c mac53c94.h mac_esp.c mac_esp.h mac_scsi.c mac_scsi.h mesh.c mesh.h Qlogic qlogicfas.c qlogicfas.h qlogicfc.c qlogicfc.h qlogicfc_asm.c qlogicisp.c qlogicisp.h qlogicisp_asm.c qlogicpti.c qlogicpti.h qlogicpti_asm.c ACARD atp870u.c atp870u.h ATAPI-SCSI Emulator ide-scsi.c ide-scsi.h IBM ips.c ips.h ibmmca.c ibmmca.h Future Domain fd_mcs.c fd_mcs.h fdomain.c fdomain.h Sparc Storage pluto.c pluto.h SGI sgiwd93.c sgiwd93.h FCAL fcal.c fcal.h Adaptec aha152x.c aha152x.h aha1542.c aha1542.h aha1740.c aha1740.h aic7xxx.c aic7xxx.h aic7xxx_proc.c aic7xxx_reg.h aic7xxx_seq.c scsi_message.h Advanced System Products advansys.c advansys.h Blizzard blz1230.c blz1230.h blz2060.c blz2060.h NCR Microelectronics Data Technology Corp dtc.c dtc.h eata.c eata.h eata_dma.c eata_dma.h eata_dma_proc.c eata_dma_proc.h eata_generic.h eata_pio.c eata_pio.h eata_pio_proc.c Perceptive pci2000.c pci2000.h pci2220i.c pci2220i.h psi240i.c psi240i.h psi_chip.h psi_dale.h psi_roy.h 53c7,8xx.c 53c7,8xx.h 53c7xx.c 53c7xx.h 53c8xx_d.h 53c8xx_u.h bvme6000.c bvme6000.h g_NCR5380.c g_NCR5380.h mac_NCR5380.c mca_53c9x.c mca_53c9x.h mvme16x.c mvme16x.h NCR5380.c NCR5380.h NCR53c406a.c NCR53c406a.h ncr53c8xx.c ncr53c8xx.h NCR53C9x.c NCR53C9x.h sim710.c sim710.h sim710_d.h sim710_u.h sym53c416.c sym53c416.h sym53c8xx.c sym53c8xx.h sym53c8xx_defs.h Always IN2000 in2000.c in2000.h BusLogic BusLogic.c BusLogic.h FlashPoint.c GVP gvp11.c gvp11.h Sparc esp.c esp.h a2091.c a2091.h a3000.c a3000.h wd33c93.c wd33c93.h ICP Vortex Corporation gdth.c gdth.h gdth_ioctl.h gdth_proc.c gdth_proc.h Pro Audio Spectrum 16 pas16.c pas16.h Seagate seagate.c seagate.h JAZZ Iomega dec_esp.c dec_esp.h jazz_esp.c jazz_esp.h imm.c imm.h ppa.c ppa.h AM AM53C974.c AM53C974.h Commodore Cyberstorm cyberstorm.c cyberstorm.h cyberstormII.c cyberstormII.h Initio i60uscsi.c i60uscsi.h i91uscsi.c i91uscsi.h ini9100u.c ini9100u.h inia100.c inia100.h Trantor t128.c t128.h Western Digital wd7000.c wd7000.h American Megatrends megaraid.c megaraid.h

Phase 5 fastlane.c fastlane.h Tekram dc390.h scsiiom.c tmscsim.c tmscsim.h Ultrastor u14-34f.c u14-34f.h ultrastor.c ultrastor.h

Legend: Depends on

Figure 1: Conceptual Architecture of SCSI Subsystem

The conceptual architecture of the subsystem is presented in Figure 1.

Release 1.0.0 1.2.0 1.2.13 2.0.0 2.0.27 2.0.30 2.0.33 2.0.34 2.0.36 2.0.37 2.2.0 2.2.10 2.2.14 2.2.16

4.2

Evolution

We have considered 38 stable releases and 67 development releases. Since its rst ocial release, SCSI subsystem has grown approximately ten times in size. This is the indication of the increasing popularity of Linux that has resulted in more and more devices supported by the Linux. This popularity is among the users of the system since in the stable release 2.2.16, 29 les out of 211 originated from the companies while Linux community developed the rest. The community maintains almost all the drivers. Such a small percentage, approximately 14not so accepted by the industry and that there is still a long way to go before an average user can get all the support that she gets by buying commercial products. The SCSI subsystem of release 2.2.16 consists of 80 device drivers, 254,953 lines of code and 2,512 functions. The average size of the driver is approximately 3,000 lines of code. Large, multi-card drivers have up to 15,000 lines of code. First interesting fact that we have discovered is that the frequency of new releases is much higher immediately after a new major release is made. Release 1.2.0 is made on Mar, 06 1995 and then by Aug, 01 1995 there were 13 new releases (including 1.2.13). Almost a year after that, Jun, 08 1996 there was a new major release, 2.0.0. In next six month, there were 27 new releases. In next three years there were only ten releases with approximately three every half of a year. The same pattern continued with 2.2.x releases. The reason for this is probably the instability of the new technology that is incorporated into every new major release. On the other hand, knowing that there are development releases in parallel, it is surprising to see so many changes as soon as the new stable release is released since most of the testing should have been done in development releases. Figure 2. shows increase in the number of les, number of deleted les, number of modied and number of non-changed les. The fact that the proportion of modied les is not approximately constant indicates that not only local corrective maintenance is performed but also more global ones that aect a lot of les at the same time. Manually analyzing les, we have remarked that most new device drivers are derived from the already existing ones and that they are indeed similar one to another, up to the level that is permitted by dierent hardware that the drivers are supposed to control. This can also be a reason for the dierences in the

Total number of files 42 64 64 98 100 101 114 114 116 140 183 197 205 207

Number of new files 42 22 0 42 2 1 14 0 3 24 52 14 8 2

Number of modified files 0 42 27 48 43 18 32 37 46 16 108 55 68 33

Number of deleted files 0 0 0 8 0 0 1 0 1 0 9 0 0 0

Number of non changed files 0 0 37 0 55 82 68 77 67 100 23 128 129 172

Figure 2: Changes at the File Level number of changed les. If an error is discovered in one of the clones then corrective changes hopefully propagate to all other clones, which results in much higher number of modied les. This implies that most of the functions are clones and that the clone detection technology and control would be valuable in the maintenance of the subsystem. We will discuss cloning in more detail later. To measure the growth of the subsystem, we have considered following metrics: Number of les Number of lines Number of functions Number of lines of code Number of lines of comment We have measured them for both, stable and development releases. The change curve for all the metrics had the same shape. The rate of increase in the size is the same as found in [7] for the whole drivers subsystem. Figure 3 presents the increase of the number of les through stable releases. Figure 4 presents the increase of the number of les through stable releases. The increase in the number of new les, added to development releases, is smooth since drivers are added and tested rst there constantly, while on the graph of stable releases, there are jumps, which indicate addition of large number of new drivers to the new major releases. This shows the practice of incorporating support for a lot of new technology to new major releases, which results in a lot of newly discovered errors by new users as soon as new release is out. Another interesting behavior observed, which also gave us additional indication for the possibility of the widely practiced cloning, is high amount of comments.

Stable Releases: Number of Files 250 200 150 100 50 0

Figure 3: Increase in Number of Files - Stable Releases


Development Releases: Number of Files 300 200 100 0

Jumps in the size of the subsystem at the points of major releases due to the number of new drivers added. We did not expect this behavior given that the drivers are self contained and can be added at any point to the subsystem. This is also the indication how we can use evolutionary changes to detect development practices. Many more changes to the system immediately after the major releases. The possible answer is the larger discovery of errors due to the larger number of users. This also contradicts the intuitive expectancy of users that the major releases are more stable and reliable. Variable number of changed les from a release to a release. This gave us the indications that some errors/modications involve changes to the multiple les, which is surprising given that drivers are self contained - this gave us the indication of possible cloning and clone management. High percentage of comments and its decrease over time. This discovery made us believe that a lot of comments are also cloned.

1999

1998

1997

1996

1995

1995

1995

1995

1995

1996

1996

1997

1998

1999

1999

1999

1999

2000

1994

Figure 4: Increase in Number of Files - Development Releases After more variations in early times of subsystem development, the amount of comments has stabilized and is approximately 40see such high amounts of comments knowing that writing comments is not usually a programmers favorite thing to do, especially in opensource development, and knowing that the drivers are relatively self contained and small and therefore easier to maintain than some other parts of code.
Stable Releases: % Comment 50 40 30 20 10 0

1994

1994

1995

1995

1995

1995

1996

1996

1996

1997

1997

1997

1997

1998

1998

1998

1998

1999

1999

1999

2000

2000

4.3

Cloning

Beside the clone indications, we have already mentioned, following architectural and development factors also indicated the possibility of a large amount of clones: Every driver must implement uniform interface. Design of subsystem does not support other forms of reuse. Driver logic is relatively simple. Devices from same family - more cloning. Completely dierent hardware - less or no cloning.
1999 1999 1999 1999 1999 2000
1999 2000

1995

1996

1996

1996

1995

1995

1995

1995

1997

1997

1998

Figure 5: Percentage of Comments - Stable Releases

Development Releases: % Comment 60 40 20 0

1998

1994

1994

1994

1995

1995

1995

1995

1996

1996

1996

1997

1997

1997

1997

1998

1998

1998

1998

1999

1999

2000

Figure 6: Percentage of Comments - Development Releases In conclusion, the subsystem has shown following interesting evolution characteristics: Linear increase in the size of the subsystem, which is equivalent to the increase in the size of the system as a whole.

Open source - anyone can reuse someone elses code. Easier and more ecient to reuse existing code. We have manually analyzed drivers for the occurrence of the clones, and we have found a high percentage of a very similar code. The most similar code was the one responsible for the initialization of drivers. The high similarity was found in the main functions, too. The main changes, in the discovered clones,

were (a) changed names of variables, (b) changed initialization parameters and constants, (c) driver specic initialization logic removed/added, and (d) updated comments. We have found small changes in the supporting functions and general driver management code. We have also found that all these changes are highly embedded into the code, which makes extraction of the code that has changed hard. The most surprising was the discovery of commented cloning. The purpose of these comments were to acknowledge someone elses work, and not the intentional clone management. Nevertheless, this allowed us to identify the main groups of cloned drivers; the ones where code similarity is denitely due to cloning, and not due to the simplicity of the code. Figure 7 presents the main groups of the cloned drivers.
esp .[ch ] jazz_ esp .[ch ] cyberstorm .[ch ]

Number of source lines: 4081 Elapsed time in seconds: 0.44 Number of Groupings: 14 Number of Blocks within those groupings: 30 Total number of duplicated lines: 373 Percent of source lines which are duplicated: 9.14 Number of groupings not shown since trial edition: 13 One of the obvious cloned segments of code that is not detected is shown in Figure 8. It is immediately after the detected segment, which is shown in red.
cyberstorm .c . static void dma _dump_state( struct NCR_ESP * esp ) { ESPLOG((" esp %d: dma -- cond _ reg <%02x> \ n", esp ->esp _id, (( struct cyber_ dma _registers *) (esp ->dregs)) ->cond _ reg )); ESPLOG((" intreq :<%04x>, intena :<%04x> \ n", custom. intreqr , custom. intenar )); } static void dma _ init _read( struct NCR_ESP * esp , __u32 addr , int length) { } static void dma _ init _read( struct NCR_ESP * esp , __u32 addr , int length) { struct cyber_ dma _registers *dregs = (struct cyber_ dma _registers *) esp ->dregs; cache_clear( addr , length); struct cyberII _ dma _registers *dregs = (struct cyberII _ dma _registers *) esp ->dregs; cache_clear( addr , length); addr &= ~(1); dregs ->dma _addr0 = ( addr >> 24) & 0xff; dregs ->dma _addr1 = ( addr >> 16) & 0xff; dregs ->dma _addr2 = ( addr >> 8) & 0xff; dregs ->dma _addr3 = ( addr } ... ) & 0xff; cyberstormII .c . static void dma _dump_state( struct NCR_ESP * esp ) { ESPLOG((" esp %d: dma -- cond _ reg <%02x> \ n", esp ->esp _id, (( struct cyberII _ dma _registers *) (esp ->dregs)) ->cond _ reg )); ESPLOG((" intreq :<%04x>, intena :<%04x> \ n", custom. intreqr , custom. intenar ));

dec _ esp .[ch ] cyberstormII .[ch ] mca _53c9x.[ ch ] blz2060.[ ch ]

fastlane .[ch ]

qlogicisp .[ch ]

fdomain .[ch ]

sd .[ch ]

t128.[ ch ]

qlogicpti .[ch ]

fd _ mcs .[ch ]

sr .[ch ]

pas16.[ ch ]

Figure 7: Documented Groups of Cloned Drivers In order to nd the exact amount of cloned code and types of clones, we looked for a widely available tool. The tool we wanted is the one that anyone can easily acquire and apply in an industrial setting. This would allow to evaluate the applicability of these tools. The only one we found that was available as the evaluation version and supported clone detection in C code is Clone Finder [4]. The tool was very user friendly, and the most interesting feature was detection and grouping of clones. These clones were visually presented and compared. The techniques for cloned detection that tool uses were not specied. When we performed the analysis using the tool, we found surprising results. The tool did not detect several obvious clones that we found previously using manual inspection. Also, the amount of clones found was very low, approximately 10 percent. The results obtained, on the most interesting grouping of cloned drivers (8 drivers that form hierarchy in Figure 7), are: Number of les scanned: 8

addr &= ~(1); dregs ->dma _addr0 = ( addr >> 24) & 0xff; dregs ->dma _addr1 = ( addr >> 16) & 0xff; dregs ->dma _addr2 = ( addr >> 8) & 0xff; dregs ->dma _addr3 = ( addr . ) & 0xff; ctrl_data &= ~(CYBER_DMA_WRITE);

Figure 8: Finder [4]

Clone Segments Detected Using Clone

This left us doubtful about current state of practice in clone detection and its usability in the industrial development. The strongest inuence on the code cloning in this case were the architecture of the subsystem. The evolution of the subsystem caused uncontrolled propagation of the cloned code throughout the subsystem. One example is the clone hierarchies. As time passes, the code changes in dierent clones and make these clones more dierent and harder to detect. In order to examine types of the dierent changes in cloned code, and how dierent architectural decisions and evolution forces eect them, we will need more powerful clone detection and removal tools.

Conclusions

We presented architecture and evolution of Linux SCSI subsystem. We identied clone duplication as the major factor that eects the evolution of the subsystem. The main source of these clones is in the architecture of the subsystem. This showed how these three areas aect each other and that they should be studied in connection with each other.

[8] C. F. Kemerer and S. Slaughter, An empirical approach to studying software evolution, IEEE Trans. on Software Engineering, 25(4), July/August 1999. [9] P. Kruchten, The 4+1 view model of architecture, IEEE Software, 12(5), November 1995. [10] J. H. Johnson, Substring Matching for Clone Detection and Change Tracking, Proceedings of the International Conference on Software Maintenance (ICSM), pages 120 - 126, 1994. [11] M. M. Lehman and L. A. Belady, Program Evolution: Processes of Software Change, Academic Press, 1985. [12] M. M. Lehman, D. E. Perry, and J. F. Ramil, Implications of evolution metrics on software maintenance, Proc. of the 1998 Intl. Conf. on Software Maintenance (ICSM98), Bethesda, Maryland, Nov 1998. [13] M. M. Lehman, J. F. Ramil, P. D. Wernick, D. E. Perry, and W. M. Turski, Metrics and laws of software evolution - the nineties view, Proc. of the Fourth Intl. Software Metrics Symposium (Metrics97), Albuquerque, NM, 1997. [14] D. L. Parnas, Software aging, Proc. of the 16 th Intl. Conf. on Software Engineering (ICSE-16), Sorrento, Italy, May 1994. [15] Dewayne E. Perry and Alexander L. Wolf, Foundations for the Study of Software Architecture, ACM SIGSOFT Software Engineering Notes, 17(4):40-52, October 1992. [16] D. E. Perry, Dimensions of software evolution, Proc. of the 1994 Intl. Conf. on Software Maintenance (ICSM94), 1994 [17] D. A. Rusling, The Linux Kernel, Website, http://www.linuxhq.com/guides/TLK/tlk.html. [18] Mary Shaw and David Garlan, Software Architecture: Perspectives on an Emerging Discipline, Pretince Hall Press, April 1996. [19] W. M. Turski, Reference model for smooth growth of software systems, IEEE Trans. on Software Engineering, 22(8), Aug 1996.

Future Work

We will analyze other driver families, perform more investigation of relative eectiveness of clone detection tools, and investigate parallel evolution by maintenance (a) type bug xes (b) new features (c) restructuring.

References
[1] I. Baxter, A. Yahin, L. Moura, M. S. Anna, and L. Bier, Clone Detection Using Abstract Syntax Trees, Proceedings of ICSM, IEEE, 1998 [2] Ivan T. Bowman, Richard C. Holt and Neil V. Brewster, Linux as a Case Study: Its Extracted Software Architecture, ICSE 99: International Conference on Software Engineering, Los Angeles,California, May 1999. [3] I. T. Bowman and R. C. Holt, Reconstructing ownership architectures to help understand software systems, Proc. of the 1999 IEEE Workshop on Program Comprehension (IWPC99), Pittsburgh, PA, May 1999. [4] Clone Finder (Trial Edition), www.studio501.com. [5] H. Gall, M. Jazayeri, R. Kloesch, and G. Trausmuth, Software evolution observations based on product release history, Proc. of the 1997 Intl. Conf. on Software Maintenance (ICSM97), Bari, Italy, Oct 1997. [6] S. G. Eick, T. L. Graves, A. F. Karr, J. S. Marron, and A. Mocku, Does code decay? Assessing the evidence from change management data, IEEE Trans. on Software Engineering. [7] Michael W. Godfrey and Qiang Tu, Evolution in Open Source Software: A Case Study, Proc. of the 2000 Intl. Conference on Software Maintenance, San Jose, California, October 2000.

Vous aimerez peut-être aussi