Vous êtes sur la page 1sur 7

Session S3H

The Evolution of Data Structures


James Harris1and Ardian Greca2
Georgia Southern University, Department of Computer Science
P.O. Box 7997, Statesboro, GA 30460

Abstract - For over 20 years, the data structures course has ListNode *next;
been a pillar of computer science programs at colleges and
universities. This paper looks at how the data structures friend class List<Object>;
course has evolved over time from a course that friend class ListItr<Object>;
emphasized algorithmic concepts to a course that };
emphasizes syntactical and design concepts. It illustrates
how the evolution of programming languages and concepts FIGURE I
can introduce “gratuitous” complexity into algorithms. A LINKED LIST NODE IMPLEMENTED IN C++
Specific algorithms and abstract data types are compared
in past and present data structures texts using a suite of In this small segment of C++ code, there are classes,
software metrics. A comparison is performed between objects, templates, constructors, default arguments that are
algorithms from data structures texts across different used in calls to other data member’s constructors and friend
programming languages and across procedural and object class declarations which are passed the underlying template
oriented paradigms. The results are compared to provide type of the ListNode class. Couldn’t the functionality be
evidence of how the course has evolved over time. expressed in a way that does not obscure the underlying
process?
Index Terms – Abstract Data Types, Data Structures, In an earlier book by Weiss, “Data Structures and
Software Metrics, Syntactic Complexity. Algorithm Analysis” [10], a node for a linked list is defined in
the “C” language as follows:
INTRODUCTION
The data structures course has been a core constituent for typedef struct Node *PtrToNode;
many years in computer science programs. While spanning typedef PtrToNode Position;
many languages, from FORTRAN to Pascal to C to C++ and struct Node
now Java, the basic content of the course has remained {
unchanged. As languages have evolved, there has been a ElementType Element;
perceived increased in the syntactic complexity of data Position Next;
structures to become obscured. Many object oriented features };
seem to add to the syntactic complexity without contributing FIGURE II
to the functionality of an algorithm. Students are required to A LINKED LIST NODE IMPLEMENTED IN C
learn more complex language syntax and very often the
underlying algorithm is lost in the confusion. There is more functionality in Weiss’s C++ version of a node,
The term “gratuitous complexity” was first described in [1] as but how much of the added complexity is gratuitous?
complexity that “contributes nothing to the task in hand”. For example, in C a node “N” could be initialized as follows:
For example, in the book “Data Structures and Algorithm
Analysis in C++” by Mark Allen Weiss[11], Weiss defines a N.element.field1 = initvalue1;
Node for a linked list in C++ as: N.element.field2 = initvalue2;
etc.
template <class Object>
This is straightforward and students do not have to focus on
class ListNode
understanding a large number of abstract programming
{
concepts in order to comprehend the underlying process. Of
ListNode( const Object & theElement = Object(
course, the fault could be placed on the author. Clever use of
), ListNode * n = NULL )
syntax very often obscures the underlying process. In his
: element( theElement ), next( n ) { }
book “Data Structures and Algorithm Analysis in Java” [12],
Weiss defines a node for a linked list as follows:
Object element;
1
James Harris, Associate Professor of Computer Science, jkharris@georgiasouthern.edu
2
Ardian Greca, Assistant Professor of Computer Science, naidrag@IEEE.org
0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA
34th ASEE/IEEE Frontiers in Education Conference
S3H-9
Session S3H
template <class Entry>
class ListNode class Binary_tree {
{ public:
// Constructors Binary_tree();
ListNode( Object theElement ) protected:
{ Binary_node<Entry> *root;
this( theElement, null ); };
}
template <class Entry>
ListNode( Object theElement, ListNode n ) struct Binary_node {
{ Entry data;
element = theElement; Binary_node<Entry> *left;
next = n; Binary_node<Entry> *right;
} Binary_node();
};
// Friendly data; accessible by other package routines
Object element; template <class Entry>
ListNode next; Binary_tree<Entry>::Binary_tree()
} {
root = NULL;
FIGURE III }
A LINKED LIST NODE IMPLEMENTED IN JAVA FIGURE V
A BINARY TREE DEFINITION IN C++
The code in Figure III is considerably more comprehendible
than the corresponding C++ code, albeit without templates. These definitions include the data definitions and the
One of the issues with implementing abstract data types implementation of a C++ default constructor and its equivalent
(ADT’s) with object oriented languages is the question of in Pascal. Clearly the C++ definition is longer and more
obscuring the algorithm by using more sophisticated language complex. Part of the adding complexity stems from the fact
constructs needed to implement objects. As the previous that in the C++ version, a tree is defined by a class rather than
example illustrates, this does not necessarily have to be the just a pointer variable as in Pascal. In fact, much of the added
case. complexity in implementing ADT’s in object oriented
As another example, consider Dale Kruse’s Pascal languages can be attributed to the use of objects rather than
and C++ definitions for a binary tree ADT [2,4]. pointers to represent ADT’s. Most books in C++ allow for
dynamic declarations such as:

type Binary Tree T(100);


treepointer = ^treenode;
This creates the need for copy constructors and destructors. It
treenode = record also allows objects to be passed to and from methods by value,
entry: treeentry; something that could not be done using the pointer
left, right: treepointer representation. Objects are now being created and destroyed
end implicitly further adding to the confusion. Java has removed
the need for dynamic memory management and I believe this
has helped reduce the gratuitous complexity associated with
function TreeEmpty(root: treepointer): Boolean; objects in C++.
begin Many more examples of gratuitous complexity can be
TreeEmpty := (root =nil) found in both older and newer data structures textbooks across
end; a variety of programming languages. The question is “Is
gratuitous complexity increasing as languages evolve?” To
FIGURE IV help shed some light on this situation, an objective measure of
A BINARY TREE DEFINITION IN C syntactic complexity is applied to ADT’s defined in various
textbooks over time to see if a trend can be found.

TESTING THE HYPOTHESIS


In order to test the hypothesis that syntactic complexity has
increased in data structures courses as languages have
0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA
34th ASEE/IEEE Frontiers in Education Conference
S3H-10
Session S3H
evolved, an objective measure of complexity is needed along ([2]-[4], [6]-[8], [10]-[12]) were analyzed using software
with a representative sample of the implementations of metrics in an attempt to measure syntactic complexity. The
ADT’s. A set of four ADT’s from nine data structures texts textbooks used in this study are categorized in Table I below:

TABLE I
TEXTBOOKS USED IN THE STUDY
Title Author(s)
Pascal Plus Data Structures Nell B. Dale, Neil Dale, Susan C. Lilly
C++ Plus Data Structures Nell B. Dale
Object Oriented Data Structures using Java Nell Dale, Daniel T. Joyce, Chip Weems
Data Structures Program Design Robert L. Kruse
Data Structure and Program Design in C Robert L. Kruse, Bruce P. Leung, Clovis L. Tondo
Data Structures and Program Design in C++ Robert L. Kruse , Alex Ryba
Data Structures and Algorithm Analysis in Java Mark Allen Weiss
Data Structures and Algorithm Analysis in C++ Mark Allen Weiss
Data Structures and Algorithm Analysis in Java Mark Allen Weiss

These books were chosen because they involve three therefore a weight of three. The weighted block count
authors (Dale, Kruse, and Weiss), who have published over (WBC), which is the sum of the weighted blocks, is a good
time various data structures books in different languages. measure of complexity since each level of nesting applies an
The functionality within each ADT tends to be consistent addition scoping and control context, adding to the
within authors, i.e. independent of the implementation complexity of an algorithm.
language. Another reason these texts were chosen is The number of different identifiers (D-Ident) requires
because each of these authors was nice enough to make the reader to remember the name and purpose of each
copies of their source code available online [13]-[15]. The identifier. A program with many identifiers is analogous to
four ADT’s analyzed are common to most data structures going to a meeting where you are introduced to many
textbooks. They are stacks (implemented with arrays), people. It becomes difficult to associate names with faces.
circular queues (also implemented with arrays), singly As a measure of complexity, the number of different
linked lists, and binary search trees. reserved words (D-RW) differs from the number of
Each set of source code was first stripped of comments identifiers. In order to understand an implementation with
and blank lines. The following seven metrics were a greater number of reserved words, the reader must have a
measured: greater knowledge of language syntax. The number of
identifiers and the number of reserved words were not
• Number of characters included as measures because they are a subset of the
• Number of tokens number of tokens. Remembering the purpose of an
• Number of lines identifier involves associating a particular variable or a
• Number of blocks (BC) function with its purpose, whereas remembering the purpose
• Number of weighted blocks (WBC) of a reserved word involves grammatical syntax such as
• Number of different identifiers (D-Ident) flow of control, primitive data types, etc.
Several common complexity measures, such as
• Number of different reserved words (D-RW)
cyclomatic numbers [5] and Halstead measures [9] were not
deemed appropriate because they tend to measure
The number of characters, tokens, and lines generally
algorithmic complexity rather than syntactic complexity.
indicates the amount of code needed for implementation. In
Table II shows the results of the metrics applied to the
general, the more code needed to implement an algorithm,
sample code. ADT stands for “Abstract Data Type”, BST
the greater the syntactic complexity.
stands for “Binary Search Tree”, “LL” stands for “Linked
The number of blocks or block count (BC) is a measure
of logical “blocks” of code. Blocks are logical groupings of Lists”, PAS stands for Pascal, and AVG stands for
“Average”.
statements used by other syntactic structures. Blocks are
The data in Table II is sorted by the primary key
determined by BEGIN and END statements in Pascal (with
several exceptions) and “{“ and “}” in C, C++, and Java. “Author”, secondary key “ADT”, and tertiary key
“Language”. The data is grouped by Author first and ADT
Weighted blocks assign a weight to each block that is the
second.
nesting level of that block. For example, if a block is nested
within two other blocks, it has a nesting level of three and

0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA


34th ASEE/IEEE Frontiers in Education Conference
S3H-11
Session S3H

TABLE II
MEASURE COUNTS

Author ADT Language Chars Tokens Lines BC WBC D-Ident D-RW

DALE BST C++ 1661 467 88 11 16 37 17


DALE BST JAVA 3384 844 203 33 76 54 18
DALE BST PAS 3097 676 185 26 66 52 12
AVG 2714 662.33 158.67 23.33 52.67 47.667 15.667

DALE LL JAVA 1618 369 103 17 37 34 20


DALE LL PAS 2556 588 141 18 44 39 16
DALE LL C++ 1974 507 108 14 16 22 14
AVG 2049.33 488 117.33 16.33 32.33 31.667 16.667

DALE QUEUE C++ 1359 386 90 11 13 17 16


DALE QUEUE JAVA 749 212 49 8 14 15 12
DALE QUEUE PAS 1161 253 62 8 15 26 10
AVG 1089.67 283.67 67 9 14 19.333 12.667

DALE STACK C++ 804 237 52 7 7 14 11


DALE STACK JAVA 1218 258 75 15 27 22 20
DALE STACK PAS 1262 290 71 10 27 34 11
AVG 1094.67 261.67 66 10.67 20.33 23.333 14

KRUSE BST C 3217 1094 175 30 46 45 12


KRUSE BST C++ 4374 1189 187 28 37 44 17
KRUSE BST PAS 3079 625 172 23 57 48 17
AVG 3556.67 969.33 178 27 46.67 45.667 15.333

KRUSE LL C 2651 795 140 24 43 23 10


KRUSE LL C++ 3461 978 167 20 26 40 15
KRUSE LL PAS 1761 563 101 18 47 37 22
AVG 2624.33 778.67 136 20.67 38.67 33.333 15.667

KRUSE QUEUE C 782 266 52 9 11 16 8


KRUSE QUEUE C++ 697 225 44 6 6 14 8
KRUSE QUEUE PAS 796 238 48 10 13 27 15
AVG 758.333 243 48 8.333 10 19 10.333

KRUSE STACK C 852 273 60 12 14 19 9


KRUSE STACK C++ 696 204 48 6 6 13 11
KRUSE STACK PAS 605 188 36 9 19 23 15
AVG 717.667 221.67 48 9 13 18.333 11.667

WEISS BST C 1676 550 123 13 19 22 8


WEISS BST C++ 4957 1347 229 29 35 29 19
WEISS BST JAVA 2557 783 153 24 49 36 19
AVG 3063.33 893.33 168.33 22 34.33 29 15.333

WEISS LL C 1705 572 136 17 20 29 8


WEISS LL C++ 2791 863 142 23 31 34 18
WEISS LL JAVA 1981 526 120 22 43 51 18
AVG 2159 653.67 132.67 20.67 31.33 38 14.667

WEISS QUEUE C 1547 507 121 15 19 27 9


WEISS QUEUE C++ 1240 359 79 9 9 20 12
WEISS QUEUE JAVA 1049 309 74 13 27 29 20
AVG 1278.67 391.67 91.333 12.33 18.33 25.333 13.667

WEISS STACK C 1374 410 98 11 12 26 8


WEISS STACK C++ 1123 325 71 9 9 18 12
WEISS STACK JAVA 963 285 66 13 27 27 20
AVG 1153.33 340 78.333 11 16 23.667 13.333

0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA


34th ASEE/IEEE Frontiers in Education Conference
S3H-12
Session S3H

Since each author has implemented different gives a measure that allows a comparison between languages
functionality for a particular data structure, the values in independent of the differing functionalities provided by each
Table II were normalized by dividing each value by the author. The data was then sorted by primary key
corresponding average value for a particular author and data “Language” (Lang) and secondary key “ADT”. The results
structure. Because the functionality of each author’s data are shown in Table III.
structure does not vary by author over the implementation
language, dividing each count by the corresponding average

TABLE III
WEIGHTED MEASURE COUNTS

Author ADT Lang Chars Tokens Lines BC WBC D-Ident D-RW Average

KRUSE BST C 0.904 1.129 0.983 1.111 0.986 0.985 0.783


WEISS BST C 0.547 0.616 0.731 0.591 0.553 0.759 0.522
KRUSE LL C 1.010 1.021 1.029 1.161 1.112 0.690 0.638
WEISS LL C 0.790 0.875 1.025 0.823 0.638 0.763 0.545
KRUSE QUEUE C 1.031 1.095 1.083 1.080 1.100 0.842 0.774
WEISS QUEUE C 1.210 1.294 1.325 1.216 1.036 1.066 0.659
KRUSE STACK C 1.187 1.232 1.250 1.333 1.077 1.036 0.771
WEISS STACK C 1.191 1.206 1.251 1.000 0.750 1.099 0.600
Avg 0.984 1.058 1.085 1.039 0.907 0.905 0.662 0.949

DALE BST C++ 0.612 0.705 0.555 0.471 0.304 0.776 1.085
KRUSE BST C++ 1.230 1.227 1.051 1.037 0.793 0.964 1.109
WEISS BST C++ 1.618 1.508 1.360 1.318 1.019 1.000 1.239
KRUSE LL C++ 1.319 1.256 1.228 0.968 0.672 1.200 0.957
WEISS LL C++ 1.293 1.320 1.070 1.113 0.989 0.895 1.227
DALE LL C++ 0.963 1.039 0.920 0.857 0.495 0.695 0.840
DALE QUEUE C++ 1.247 1.361 1.343 1.222 0.929 0.879 1.263
KRUSE QUEUE C++ 0.919 0.926 0.917 0.720 0.600 0.737 0.774
WEISS QUEUE C++ 0.970 0.917 0.865 0.730 0.491 0.789 0.878
DALE STACK C++ 0.734 0.906 0.788 0.656 0.344 0.600 0.786
KRUSE STACK C++ 0.970 0.920 1.000 0.667 0.462 0.709 0.943
WEISS STACK C++ 0.974 0.956 0.906 0.818 0.563 0.761 0.900
Avg 1.071 1.087 1.000 0.881 0.638 0.834 1.000 0.930

DALE BST JAVA 1.247 1.274 1.279 1.414 1.443 1.133 1.149
WEISS BST JAVA 0.835 0.876 0.909 1.091 1.427 1.241 1.239
DALE LL JAVA 0.790 0.756 0.878 1.041 1.144 1.074 1.200
WEISS LL JAVA 0.918 0.805 0.905 1.065 1.372 1.342 1.227
DALE QUEUE JAVA 0.687 0.747 0.731 0.889 1.000 0.776 0.947
WEISS QUEUE JAVA 0.820 0.789 0.810 1.054 1.473 1.145 1.463
DALE STACK JAVA 1.113 0.986 1.136 1.406 1.328 0.943 1.429
WEISS STACK JAVA 0.835 0.838 0.843 1.182 1.688 1.141 1.500
Avg 0.906 0.884 0.936 1.143 1.359 1.099 1.269 1.085

DALE BST PAS 1.141 1.021 1.166 1.114 1.253 1.091 0.766
KRUSE BST PAS 0.866 0.645 0.966 0.852 1.221 1.051 1.109
DALE LL PAS 1.247 1.205 1.202 1.102 1.361 1.232 0.960
KRUSE LL PAS 0.671 0.723 0.743 0.871 1.216 1.110 1.404
DALE QUEUE PAS 1.065 0.892 0.925 0.889 1.071 1.345 0.789
KRUSE QUEUE PAS 1.050 0.979 1.000 1.200 1.300 1.421 1.452
DALE STACK PAS 1.153 1.108 1.076 0.938 1.328 1.457 0.786
KRUSE STACK PAS 0.843 0.848 0.750 1.000 1.462 1.255 1.286
Avg 1.005 0.928 0.978 0.996 1.276 1.245 1.069 1.071

The average for each metric over each language is


summarized in the graph shown below:

0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA


34th ASEE/IEEE Frontiers in Education Conference
S3H-13
Session S3H

1.600

1.400

1.200

1.000 C
C++
0.800
Java
0.600 Pascal

0.400

0.200

0.000
Chars Tokens Lines BC WBC D-Ident D-RW

FIGURE VI
COMPARING COMPLEXITY METRICS

reserved words in C than in C++ or Java. What is a surprise


CONCLUSIONS is the large number of different reserved words needed in the
Pascal implementations, since Pascal also has significantly
The results were surprising. Even though C++ received the fewer reserved words than C++ or Java.
lowest overall score, I believe from experience that of the C++ requires the highest number of tokens, yet the
four languages surveyed, C++ contains the most gratuitous second lowest number of different reserved words. In
complexity. This indicates that there are factors at work Figure VI, C and C++ were tightly coupled, i.e. their graphs
influencing syntactic complexity other than those measured. are closely related, which might be expected. However, the
There are, however, some interesting observations. graphs of Java and Pascal are also tightly coupled.
The graph in Figure VI shows that the Java
implementations use fewer characters, tokens and lines of FUTURE WORK
source code than the other three languages; however, there is
a cost. Java clearly uses more blocks, nested blocks, and The main problem encountered in this survey lies in the
reserved words to achieve this goal. definition of syntactic complexity. Future work will
Both C and C++ required slightly more tokens and lines concentrate on the factors influencing syntactic complexity
of source code than Java and Pascal, but had significantly allowing for a better definition. These factors are probably
lower weighted block counts than either Java or Pascal. best determined through research on humans.
Consider the C++ linked list node code sample from Figure
I. The weighted block count for the C++ node is three; REFERENCES
however, the extra weight is made up for by the fact that the
[1] Alessi, S., M., Trollip, S., R., Computer-based Instruction, 1st edition,
node contains initializations for the default values of data 1991.
members.
Another interesting statistic is that the block count for [2] Dale N., B., C++ Plus Data Structures, 3rd edition , 2003.
Pascal is relatively low, but the weighted block count is [3] Dale N., B., Joyce, D. T., Weems, Chip Object Oriented Data
quite high. This can be explained by the fact that Pascal Structures Using Java, 1st edition, 2002.
allows for nested procedures and functions, while C and [4] Dale N., B., Joyce Pascal Plus Data Structures, 2nd edition , 1988.
C++ do not.
[5] Halstead, M., H. Elements of Software Science, Operating, and
It is not surprising that C had the fewest number of Programming Systems Series, Volume 7., 1977.
different reserved words, since there are significantly fewer
[6] Kruse, R., H., Data Structures and Program Design, 2nd edition, 1987.

0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA


34th ASEE/IEEE Frontiers in Education Conference
S3H-14
Session S3H

[7] Kruse, R., H., Tondo, C., L., Leung, B., Data Structures and Program [11] Weiss, M., A. Data Structures and Algorithm Analysis in C++, 2nd
Design in C, 2nd edition, 1997. edition, 1999.
[8] Kruse, R., H., Data Structures and Program Design in C++, 1st [12] Weiss, M., A. Data Structures and Algorithm Analysis in Java, 2nd
edition, 1998. edition, 2002.
[9] McCabe, T., J., Complexity Measure, IEEE Transctions on Software [13] ftp://ftp.prenhall.com/pub/esm/computer_science.s-041/kruse
Engineering,, pp 308-320, December 1976.
[14] http://computerscience.jbpub.com/cs_resources.cfm
[10] Weiss, M., A. Data Structures and Algorithm Analysis, 2nd edition,
1994. [15] http://www.cs.fiu.edu/~weiss/dsaajava/code/

0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA


34th ASEE/IEEE Frontiers in Education Conference
S3H-15

Vous aimerez peut-être aussi