Vous êtes sur la page 1sur 97

the joy of programming

column collection from linux for you


S G Ganesh sgganesh at gmail dot com

Imprint
Self-Published In May, 2012 Version 0.4: May 04, 2012

Attribution-ShareAlike CC BY-SA

Table of Contents

Preface! Understanding Bit-fields in C! How to Detect Integer Overflow! Fail Fast!! Abort, Retry, Fail?! Calling Virtual Functions from Constructors! Scope, Lifetime and Visibility in C! Demystifying the Volatile Keyword in C! Some Puzzling Things About C Language!! Silly Programming Mistakes => Serious Harm!! About the Java Overflow Bug! How Debugging Can Result in Bugs!! SNAFUSituation Normal, All Fouled Up!! The Legacy of C! The Technology Behind Static Analysis Tools! The Broken Window Theory! Levels of Exception Safety! Bug Hunt! Language Transition Bugs!

4 6 10 13 16 19 25 29 31 34 37 39 42 45 53 59 61 65 68

Penny wise and pound foolish!! Lets Go : A First Look At Googles Go Programming Language! Typo Bugs! Liskovs Substitution Principle! Why is a Software Glitch Called a Bug?! A Bug or a Feature?! Types of Bugs!

71 73 82 85 90 92 95

Preface

The Andromeda Galaxy1

This book is a collection of articles I wrote for Linux For You (LFY) magazine. LFY has done a great work typesetting the articles. They have also made them freely available online (Creative Commons Attribution-ShareAlike License). Ive just collected them together and making them available for your reading pleasure in a single PDF book format. The column entires are not in any specific order - I just listed them as I liked it. Also, all most column articles are not made available online yet - as and when they are available, Ill update this book. Most images in this book are under CC-BY-SA and are included in the LFY articles in the website and some images are of my own. Ive included image credits as well as the links to the original webpage links to the corresponding columns in LFY website.The word cloud that appears in the front page of this book was created using wordle. What is common to different areas like Programming, Astronomy, and Gardening? It is the sheer joy of doing things just because they make us happy, like gazing at stars in a night, or

1 Creative Commons License: http://en.wikipedia.org/wiki/File:Andromeda_Galaxy_(with_h-alpha).jpg

eating a fruit from a tree that you planted fifteen years ago. I thoroughly enjoyed writing the columns (as well as compiling this book from the columns), and I hope you like reading it too. If youve any feedback, suggestions, or just anything related to these columns, feel free to drop me a mail: sgganesh at gmail dot com.

Understanding Bit-fields in C
One important feature that distinguishes C as a systems programming language is its support for bit-fields. Let us explore this feature in this column.

http://en.wikipedia.org/wiki/File:Binary_executable_le2.png

Fusce cursus tellus a augue. Phasellus ut arcu ut nulla dignissim auctor. Etiam sed elit sed diam tempus consectetuer. Proin dignissim, velit a gravida elementum, metus tellus dictum mauris, quis auctor est lectus in sapien. Fusce diam arcu, ultricies non, ullamcorper feugiat, semper quis, tortor. Phasellus egestas feugiat augue. Vestibulum nec wisi. Phasellus et ligul

Image source: http://en.wikipedia.org/wiki/File:Binary_executable_le2.png

In C, structure members can be specified with size in number of bits, and this feature is known as bit-fields. Bit-fields are important for low-level (i.e., for systems programming) tasks such as directly accessing systems resources, processing, reading and writing in terms of streams of bits (such as processing packets in network programming), cryptography (encoding or decoding data with complex bit-manipulation), etc.

Consider the example of reading the components of a floating-point number. A 4-byte floating-point number in the IEEE 754 standard consists of the following: The first bit is reserved for the sign bit it is 1 if the number is negative and 0 if it is positive. The next 8 bits are used to store the exponent in the unsigned form. When treated as a signed exponent, this exponent value ranges from -127 to +128. When treated as an unsigned value, its value ranges from 0 to 255. The remaining 23 bits are used to store the mantissa. Here is a program to print the value of a floating-point number into its constituents: struct FP { // the order of the members depends on the // endian scheme of the underlying machine unsigned int mantissa : 23; unsigned int exponent : 8; unsigned int sign : 1; } *fp; int main() { float f = -1.0f; fp = (struct FP *)&f; printf(" sign = %s, biased exponent = %u, mantissa = %u ", fp->sign ? "negative" : "positive", fp->exponent, fp->mantissa); } For the floating-point number -1.0, this program prints: sign = negative, biased exponent = 127, mantissa = 0 Since the sign of the floating-point number is negative, the value of the sign bit is 1. Since the exponent is actual 0, in unsigned exponent format, it is represented as 127, and hence that value is printed. The mantissa in this case is 0, and hence it is printed as it is.

To understand how floating-point arithmetic works, see this Wikipedia article. An alternative to using bit-fields is to use integers directly, and manipulate them using bitwise operators (such as &, |, ~, etc.). In the case of reading the components of a floating-point number, we could use bitwise operations also. However, in many cases, such manipulation is a round-about way to achieve what we need, and the solution using bit-fields provides a more direct solution and hence is a useful feature. There are numerous limitations in using bit-fields. For example, you cannot apply operators such as & (addressof), sizeof to bit-fields. This is because these operators operate in terms of bytes (not bits) and the bit-fields operate in terms of bits (not bytes), so you cannot use these operators. In other words, an expression such as sizeof(fp->sign) will result in a compiler error. Another reason is that the underlying machine supports addressing in terms of bytes, and not bits, and hence such operators are not feasible. Then how does it work when expressions such as fp->sign, or fp->exponent are used in this program? Note that C allows only integral types as bit-fields, and hence expressions referring to the bitfields are converted to integers. In this program, as you can observe, we used the %u format specifier, which is for an unsigned integer the bit-field value was converted into an integer and that is why the program worked. Those new to bit-fields face numerous surprises when they try using them. This is because a lot of low-level details come into the picture while using them. In the programming example for bit-fields, you might have noticed the reversal in the order of the sign, exponent and mantissa, which is because of the underlying endian scheme followed. Endian refers to how bytes are stored in memory (see this Wikipedia article for more details). Can you explain the following simple program that makes use of a bit-field? struct bitfield { int bit : 1; } BIT; int main() { BIT.bit = 1; printf(" sizeof BIT is = %d\n", sizeof(BIT));

printf(" value of bit is = %d ", BIT.bit); } It prints: sizeof BIT is = 4 value of bit is = -1 Why? Note that it is not a compiler error to attempt to find the sizeof(BIT) because it is a structure; had we attempted sizeof(BIT.bit), that will not compile. Now, coming to the output, if we had used only one bit in the BIT structure, why is the sizeof(BIT) 4 bytes? It is because of the addressing requirement of the underlying machine. The machine might perhaps require all structs to start in an address divisible by 4; or perhaps, allocating the size of a WORD for the structure is more efficient even if the underlying machine may require that structs start at an even address. Also, the compiler is free to add extra bits between any struct members (including bit-field members), which is known as padding. Now let us come to the next output. We set BIT.bit = 1; and the printf statement printed -1! Why was that? Note that we declared bit as int bit : 1; where the compiler treated the bit to be a signed integer of one bit size. Now, what is the range of a 1-bit signed integer? It is from 0 to -1 (not 0 and 1, which is a common mistake). Remember the formula for finding out the range of signed integers: 2(n-1) to 2(n-1)-1 where N is the number of bits. For example, if N is 8 (number of bits in a byte), i.e., the range of a signed integer of size 8 is -2(8-1) to 2(8-1)-1, which is -128 to +127. Now, when N is 1, i.e., the range of a signed integer of size 1, it is -2(1-1) to 2(1-1)-1, which is -1 to 0! No doubt, bit-fields are a powerful feature for low-level bit-manipulation. The cost of using bit-fields is the loss of portability. We already saw how padding and ending issues can affect portability in our simple program for reading the components of a floating-point number. Bitfields should be used in places where space is very limited, and when functionality is demanding. Also, the gain in space could be lost in efficiency: bit-fields take more time to process, since the compiler takes care of (and hides) the underlying complexity in bitmanipulation to get/set the required data. Bugs associated with bit-fields can be notoriously hard to debug, since we need to understand data in terms of bits. So, use bit-fields sparingly and with care.

How to Detect Integer Overflow


Integer overflows often result in nasty bugs. In this column, well look at some techniques to detect an overflow before it occurs.

Overow in a mechanical odometer (250px-Odometer_rollover.jpg)

Integer overflow happens because computers use fixed width to represent integers. So which are the operations that result in overflow? Bitwise and logical operations cannot overflow, while cast and arithmetic operations can. For example, ++ and += operators can overflow, whereas && or & operators (or even << and >> operators) cannot. Regarding arithmetic operators, it is obvious that operations like addition, subtraction and multiplication can overflow. How about operations like (unary) negation, division and mod (remainder)? For unary negation, -MIN_INT is equal to MIN_INT (and not MAX_INT), so it overflows. Following the same logic, division overflows for the expression (MIN_INT / -1). How about a mod

10

operation? It does not overflow. The only possible overflow case (MIN_INT % -1) is equal to 0 (verify this yourselfthe formula for % operator is a % b = a - ((a / b) * b)). Let us focus on addition. For the statement int k = (i + j);: If i and j are of different signs, it cannot overflow. If i and j are of same signs (- or +), it can overflow. If i and j are positive integers, then their sign bit is zero. If k is negative, it means its sign bit is 1it indicates the value of (i + j) is too large to represent in k, so it overflows. If i and j are negative integers, then their sign bit is one. If k is positive, it means its sign bit is 0it indicates that the value of (i + j) is too small to represent in k, so it overflows. To check for overflow, we have to provide checks for conditions 3 and 4. Here is the straightforward conversion of these two statements into code. The function isSafeToAdd returns true or false after checking for overflow. /* Is it safe to add i and j without overflow? Return value 1 indicates there is no overflow; else it is overflow and not safe to add i and j */ int isSafeToAdd(int i, int j) { if( (i < 0 && j < 0) && k >=0) || (i > 0 && j > 0) && k <=0) ) return 0; return 1; // no overflow - safe to add i and j } Well, this does the work, but is inefficient. Can it be improved? Let us go back and see what i + j is, when it overflows. If ((i + j) > INT_MAX) or if ((i + j) < INT_MIN), it overflows. But if we translate this condition directly into code, it will not work: if ( ((i + j) > INT_MAX) || ((i + j) < INT_MIN) ) return 0; // wrong implementation
11

Why? Because (i + j) overflows, and when its result is stored, it can never be greater than INT_MAX or less than INT_MIN! Thats precisely the condition (overflow) we want to detect, so it wont work. How about modifying the checking expression? Instead of ((i + j) > INT_MAX), we can check the condition (i > INT_MAX - j) by moving j to the RHS of the expression. So, the condition in isSafeToAdd can be rewritten as: if( (i > INT_MAX - j) || (i < INT_MIN - j) ) return 0; That works! But can we simplify it further? From condition 2, we know that for an overflow to occur, the signs of i and j should be different the same. If you notice the conditions in 3 and 4, the sign bit of the result (k) is different from (i and j). Does this strike you as the check that the ^ operator can be used? How about this check: int k = (i + j); if( ((i ^ k) & (j ^ k)) < 0) return 0; Let us check it. Assume that i and j are positive values and when it overflows, the result k will be negative. Now the condition (i ^ k) will be a negative valuethe sign bit of i is 0 and the sign bit of k is 1; so ^ of the sign bit will be 1 and hence the value of the expression (i ^ k) is negative. So is the case for (j ^ k) and when the & of two values is negative; hence, the condition check with < 0 becomes true when there is overflow. When i and j are negative and k is positive, the condition again is < 0 (following the same logic described above). So, yes, this also works! Though the if condition is not very easy to understand, it is correct and is also an efficient solution!

12

Fail Fast!
When a problem occurs in the software, it should fail immediately, in an easily noticeable way. This fail fast behavior is desirable, and well discuss this important concept in this column.

At first, a fail fast might appear to be a bad practice affecting reliability why should a system crash (or fail), when it can continue execution? For this, we need to understand that fail fast is very relevant in the context of Heisenbugs. Consider Bohrbugs, which always crash for a given input, for example, with a null-pointer access. These bugs are easier to test, reproduce and fix. Now, all experienced programmers would have faced situations where the bug that caused the crash just disappears when the software is restarted. No matter how much time and effort is spent to reproduce the problem, the bug eludes us. These bugs are known as Heisenbugs. The effort required to find, fix and test Heisenbugs is an order of magnitude more than the effort required for Bohrbugs. One strategy to avoid Heisenbugs is to turn them into Bohrbugs. How? By anticipating the possible cases in which Heisenbugs can arise, and trying

13

to make them Bohrbugs. Yes, it is not easy, and it is also not always possible, but let us look at a specific example where it is useful. Concurrent programming is one paradigm where Heisenbugs are common. Our example is a concurrency-related issue in Java. While iterating over a Java collection, we are supposed to modify the collection only through the Iterator methods, such as the remove() method. During iteration, if another thread attempts to modify that underlying collection (because of a programming mistake), the underlying collection will get corrupted (i.e., result in an incorrect state). Such an incorrect state can lead to an eventual failure or if we are fortunate (actually, unfortunate!), the program continues execution without crashing, but gives the wrong results. It is difficult to reproduce and fix these bugs, because such programming mistakes are non-deterministic. In other words, it is a Heisenbug. Fortunately, the Java Iterators try to detect such concurrent modifications, and if found, will throw a ConcurrentModificationException, instead of failing late and that too, silently. In other words, the Java Iterators follow the fail fast approach. What if a ConcurrentModificationException is observed in production software? As the Javadoc for this exception observes, it should be used only to detect bugs. In other words, ConcurrentModificationExceptions are supposed to be found and fixed during software development, and should not leak to production code. Well, if production software does get this exception, it is certainly a bug in the software, and should be reported to the developer and fixed. At least, we know that there was an attempt for concurrent modification of the underlying data structure, and thats why the software failed (instead of getting wrong results from the software, or failing later with some other symptoms, for which it is not feasible to trace the root cause). The fail-safe approach is meant for developing robust code. A very good example of writing fail-safe code is using assertions. Unfortunately, there is a lot of unnecessary controversy surrounding the use of asserts. The main criticism is this: the checks are enabled in the development version, and disabled in release versions. However, this criticism is wrong: asserts are never meant to replace the defensive checks that should be put in place in the release version of the software. For example, asserts should not be used to check if the argument passed to a function is null or not. Instead, an if condition should be used to check if the argument is passed correctly, or else an exception, or a premature return, should be performed, as appropriate to the context. However, asserts can
14

be used to do additional checks for assumptions that are made in the code, which are supposed to hold true. For example, a condition that checks that the stack is not empty after a push operation is performed on it (i.e., checking for invariants). So, fail fast, be assertive, and youre on the way to developing more robust code.

15

Abort, Retry, Fail?


Exception handling is tricky to get right. In this column, I present some guidelines for writing good exception handling code, by giving real-world (counter) examples.

Guideline 1 Write descriptive error messages.

Guideline 2

16

Check for syntax errors before shipping! If your application is written in an interpreted language or a scripting language, your users can end up getting syntax errors if you dont test it well.

Guideline 3 Write helpful error messages. In the following example, what exactly is the difference between Abort and Fail? Also, it is better to specify what needs to be done to recover from the situation for example, Insert disk instead of Retry.

Guideline 4 Dont contradict yourself. When throwing an error back to the user, the description should support why an exception happened, not contradict it.

Guideline 5 Dont try humor. If yours is a critical application, your user will not laugh when an exceptional situation occurs!
17

18

Calling Virtual Functions from Constructors


Calling virtual functions from constructors is problematic, and this problem can manifest itself in many ways. In this column, well take a look at this problem, with specific examples.

Last year, I bought a BlackBerry mobile. It came with software that can be installed on a PC, with which one can transfer songs, data, etc., from the PC to the mobile. When I installed the software and started it, it promptly crashed with the exception: pure virtual function call! Surprisingly, over a period of five years, Ive faced the same problem many times, and some of the screenshots Ive taken from different software are shown in Figures 1 to 3.

19

Figure 1: Pure virtual function call runtime error in FireFox

Figure 2: Pure virtual function call runtime error in BlackBerry software

Figure 3: Pure virtual function call runtime error in Acrobat Reader

20

Note that this behaviour is not specific to Windows software; software compiled with GCC on Linux will fail with a similar exception. Now, let us dig deeper, to understand this software bug. Virtual functions are resolved based on the runtime type of the object on which they are called. If we try invoking virtual methods from constructors or destructors, it is problematic. Why? Consider the case of calling a virtual function from a constructor. When the base class constructor executes, the derived object is not constructed yet. If there is a virtual function call that is supposed to bind to the derived type, how can it be handled? The ways in which OO languages handle this situation differ. In C++, the virtual function is treated as non-virtual, and the base type method is called. In Java, the call is resolved to the derived type. Both these approaches can cause unintuitive results. Let us first discuss the C++ case, and then move on to the Java case. In C++, if you try the following program, it will print Inside base::vfun since the virtual function is resolved to the base type (i.e., static type itself, instead of the dynamic type): struct base { base() { vfun(); } virtual void vfun() { cout << "Inside base::vfunn"; } }; struct deri : base { virtual void vfun() { cout << "Inside deri::vfunn"; } };
21

int main(){ deri d; } Now, how about this program: struct base { base() { base * bptr = this; bptr->bar(); // even simpler ... ((base*)(this))->bar(); } virtual void bar() =0; }; struct deri: base { void bar(){ } }; int main() { deri d; } Now, youll get the pure virtual function call exception thrown by the C++ runtime, which is similar to the three screenshots we saw earlier! In this case, the bar() method is a pure virtual function, which means that it is not defined yet (it is defined later, in a derived class). However, since we are invoking bar() from the base class constructor, it tries to call the pure virtual function; it is not possible to invoke a function that is not defined, and hence it results in a runtime exception (technically, it is undefined behaviour).

22

Note how we invoked bar() in the base class constructor it is after casting the this pointer into the (base *) type. If we attempt to directly call a pure virtual function, the compiler will give a compile-time error. Now, lets look at a simple Java example. Can you predict its output? class Base { public Base() { foo(); } public void foo() { System.out.println("In Base's foo "); } } class Derived extends Base { public Derived() { i = new Integer(10); } public void foo() { System.out.println("In Derived's foo " + i.toString() ); } private Integer i; } class Test { public static void main(String [] s) { new Derived().foo(); }
23

} The program crashes with a NullPointerException! Why? As I mentioned earlier, in Java, virtual functions are resolved to the dynamic type. Here, foo is a virtual function (in Java, all non-static, non-final methods are virtual) and we try invoking it from the constructor. Since it resolves to the dynamic type, the derived version of foo is called. Remember that we are still executing the base class constructor, and the derived constructor is yet to execute. Hence the private variable i inside Derived is not initialised yet (and all reference type variables are initialised to null in Java). Hence, the call i.toString() results in accessing the yet-to-be-initialised Derived object, and results in a NullPointerException. The C# approach to calling virtual functions is similar to that of Java. Calling virtual functions from constructors/destructors is risky, no matter which OO language we use. Even if the program works in cases where virtual functions are called from constructors, the program can suddenly start failing if we extend the base classes in which such calls are present. Hence, it is a bad programming practice to call virtual functions from constructors/destructors, and most static analysers warn about this problem.

24

Scope, Lifetime and Visibility in C


Often, programmers confuse the scope, lifetime and visibility of variables. So Ill cover these three important concepts in this months column.

Whenever you declare a variable, you determine its scope, lifetime and visibility. These three are important concepts associated with any variable declared in C. Understanding the difference between them, and how they are related to each other, will help avoid mistakes in writing code.

25

Scope Scope is defined as the area in which the declared variable is available. There are five scopes in C: program, file, function, block, and prototype. Let us examine a dummy program to understand the difference (the comments indicate the scope of the specific variable): void foo() {} // "foo" has program scope static void bar() { // "bar" has file scope printf("hello world"); int i; // "i" has block scope } void baz(int j); // "j" has prototype scope print: // "print" has function scope The foo function has program scope. All non-static functions have program scope, and they can be called from anywhere in the program. Of course, to make such a call, the function needs to be first declared using extern, before being called, but the point is that it is available throughout the program. The function bar has file scope it can be called from only within the file in which it is declared. It cannot be called from other files, unlike foo, which could be called after providing the external declaration of foo. The label print has function scope. Remember that labels are used as a target for jumps using goto in C. There can be only one print label inside a function, and you can write a goto print statement anywhere in the function, even before the label appears in the function. Only labels can have function scope in C.

26

The variable i has block scope, though declared at the same level/block as print. Why is that so? The answer is, we can define another variable with the same name i inside another block within the bar function, whereas it is not possible for print, since it is a label. The variable j has prototype scope: you cannot declare any other parameter with the same name j in the function baz. Note that the scope of j ends with the prototype declaration: you can define the function baz with the first argument with any name other than j. Lifetime The lifetime of a variable is the period of time in which the variable is allocated a space (i.e., the period of time for which it lives). There are three lifetimes in C: static, automatic and dynamic. Let us look at an example: 1int 2 3 4 5 6 7 8} foo() { static int count = 0; // "count" has static lifetime int * counter = malloc(sizeof(int)); // "counter" has automatic lifetime free(counter); // malloced memory has dynamic lifetime

In this code, the variable count has a static lifetime, i.e., its lifetime is that of the program. The variable counter has an automatic lifetime its life is till the function returns; it points to a heap-allocated memory block its life remains till it is explicitly deleted by the program, which is not predictable, and hence it has a dynamic lifetime. Visibility Visibility is the accessibility of the variable declared. It is the result of hiding a variable in outer scopes. Here is a dummy example:

27

1int i; 2// the "i" variable is accessible/visible here 3void foo() { 4 int i; 5 // the outer "i" variable 6 // is not accessible/visible here 7 { 8 int i; 9 // two "i" variables at outer scopes 10 // are not accessible/visible here 11 } 12 // the "i" in this block is accessible/visible 13 // here and it still hides the outer "i" 14} 15// the outermost "i" variable 16//is accessible/visible here Summary of differences As you can see, scope, lifetime and visibility are related to each other, but are distinct. Scope is about the availability of the declared variable: within the same scope, it is not possible to declare/define two variables of the same type with the same name. Lifetime is about the duration in which the variable is alive: it determines how long the named or unnamed variable has memory allocated to it. Visibility is about the accessibility of the declared variables: it arises because of the possibility of variables in outer scope having the same name as the ones in inner scopes, resulting in hiding. "

28

Demystifying the Volatile Keyword in C


Most programmers dont understand the meaning and significance of the volatile keyword. So lets explore that in this months column.

One of my favourite interview questions for novice programmers is: What is the use of the volatile keyword? For experienced programmers, I ask: Can we qualify a variable as both const and volatileif so, what is its meaning? I bet most of you dont know the answer, right? The keyword volatile is to do with compiler optimisation. Consider the following code:

long *timer = 0x0000ABCD; // assume that at location 0x0000ABCD the current time is available long curr_time = *timer; // initialize curr_time to value from timer // wait in while for 1 sec (i.e. 1000 millisec) while( (curr_time - *timer) < 1000 ) { curr_time = *timer; // update current time } print_time(curr_time); // this function prints the current time from the // passed long variable
Usually, hardware has a timer that can be accessed from a memory location. Here, assume that its 0x0000ABCD and is accessed using a long * variable timer (in the UNIX tradition, time can be represented as a long variable and increments are done in milliseconds). The loop is
29

meant to wait one second (or 1,000 milliseconds) by repeatedly updating curr_time with the new value from the timer. After a one second delay, the program prints the new time. Looks fine, right? However, from the compiler point of view, what the loop does is stupidit repeatedly assigns curr_time with *timer, which is equivalent to doing it once outside the loop. Also, the variable timer is de-referenced repeatedly in the loopwhen it is enough to do it once. So, to make the code more efficient (i.e., to optimise it), it may modify loop code as follows:

curr_time = *timer; // update current time long temp_time = *timer; while( (curr_time - temp_timer) < 1000 ) { /* do nothing here */ }
As you can see, the result of this transformation is disastrous: the loop will never terminate because neither is curr_time updated nor is the timer de-referenced repeatedly to get new (updated time) values. What we need is a way to tell the compiler not to play around with such variables by declaring them volatile, as in:Now, the compiler will not do any optimisation on these variables. This, essentially, is the meaning of the volatile keyword: It declares the variables as asynchronous variables, i.e., variables that are not-modified-sequentially. Implicitly, all variables that are not declared volatile are synchronous variables. How about qualifying a variable as both const and volatile? As we know, when we declare a variable as const, we mean its a read-only variableonce we initialise it, we will not change it again, and will only read its value. Here is a modified version of the example:

long * const timer = 0x0000ABCD; // rest of the code as it was before..


We will never change the address of a timer, so we can put it as a const variable. Now, remember what we did to declare the timer as volatile:

volatile long * timer = 0x0000ABCD;


We can now combine const and volatile together:

volatile long * const timer = 0x0000ABCD;


It reads as follows: the timer is a const pointer to a long volatile variable. In plain English, it means that the timer is a variable that I will not change; it points to a value that can be changed without the knowledge of the compiler!
30

Some Puzzling Things About C Language!


Have you wondered why some of the features of C language are unintuitive? As well see in this column, there are historical reasons for many of Cs features.

Image source: http://en.wikipedia.org/wiki/File:The_C_Programming_Language_cover.svg

1. Can you guess why there is no distinct format specifier for double in the printf/scanf format string, although it is one of the four basic data types? (Remember we use %lf for printing the double value in printf/scanf; %d is for integers). 2. Why is some of the precedence of operators in C wrong? For example, equality operators (==, !=, etc) have higher precedence than logical operators (&&, ||). 3. In the original C library, <math.h> has all operations done in double precision, i.e., long float or double (and not single precision, i.e., float). Why?
31

4. Why is the output file of the C compiler called a.out? Answers: 1. In older versions of C, there was no double it was just long float type and that is the reason why it has the format specifier %lf (%d was already in use to indicate signed decimal values). Later, double type was added to indicate that the floating point type might be of double precision (IEEE format, 64-bit value). So a format specifier for long float and double was kept the same. 2. The confusion in the precedence of the logical and equality operators is the source of numerous bugs in C. For example, in (a && b == c && d), == has higher precedence than &&. So it is interpreted as, (a && (b == c) && d), which is not intuitive. There is a historical background for this wrong operator precedence. Here is the explanation given by Dennis Ritchie [See Dennis M. Ritchie, Operator precedence, net.lang.c, 1982]: Early C had no separate operators for & and && or | and ||. Instead it used the notion (inherited from B and BCPL) of truth-value context: where a Boolean value was expected, after if and while and so forth; the & and | operators were interpreted as && and || are now; in ordinary expressions, the bit-wise interpretations were used. It worked out pretty well, but was hard to explain. (There was the notion of top-level operators in a truth-value context.) The precedence of & and | were as they are now. Primarily at the urging of Alan Snyder, the && and || operators were added. This successfully separated the concepts of bit-wise operations and short-circuit Boolean evaluation. However, I had cold feet about the precedence problems. For example, there were lots of programs with things like: if (a==b & c==d) In retrospect it would have been better to go ahead and change the precedence of & to higher than ==, but it seemed safer just to split & and && without moving & past an existing operator. 3. Since C was originally designed for writing UNIX (system programming), the nature of its application reduced the necessity for floating point operations. Moreover, in the hardware of the original and initial implementations of C (PDP-11) floating point arithmetic was done in double precision (long float or double type) only. Writing library functions seemed to be easy if only one type was handled. For these reasons, the library functions involving mathematics (<math.h>) were done for double types, and all the floating point calculations were promoted and were done in double precision only. For
32

the same reason, when we use a floating point literal, such as 10.0, it is treated as double precision and not single precision. 4. The a.out stands for assembler.output file[see cm.bell-labs.com/who/dmr/chist.html]. The original UNIX was written using an assembler for the PDP-7 machine. The output of the assembler was a fixed file name, which was a.out to indicate that it was the output file from the assembler. No assembly needs to be done in modern compilers; instead, linking and loading of object files is done. However, this tradition continues and the output of cc is by default a.out!

33

Silly Programming Mistakes => Serious Harm!


As programmers, we know that almost any software that we use (or write!) has bugs. What we might not be aware of is that many disasters occur because of silly mistakes.

Image source: http://en.wikipedia.org/wiki/File:Ariane_5_(mock-up).jpg

What can software bugs cost? Nothing, I hear someone saying. They can be beneficial and ensure job securitysince the more bugs we put in the software, the more work we get in the future to fix those embedded bugs!

34

On a more serious note, software bugs can even cost human lives. Many mishaps and disasters have happened in the past because of software bugs [see Collection of Software Bugs for a detailed list]. For example, during the 1980s, at least six people were killed because of a synchronisation bug in the Therac-25 radiation treatment machine. In 1996, the Ariane 5 rocket exploded shortly after its take-off because of an unhandled overflow exception. A sobering thought about software bugs is that, though they might occur because of silly or innocuous mistakes, they can cause serious harm. In 1962, the Mariner-I rocket (meant to explore Venus) veered off track and had to be destroyed. It had a few software bugs and one main problem was traced to the following Fortran statement: DO 5 K = 1. 3. The . should have been a comma. The statement was meant to be a DO loop, as in DO 5 K = 1, 3, but while typing the program, it was mistyped as DO 5 K = 1. 3. So, whats the big deal? In old Fortran, spaces were ignored, so we can have spaces in identifiers (yes, believe me, its true). Hence this became a declaration for a variable of the real type DO5K with an initial value of 1.3 instead of a DO loop. So, a rocket worth $18.5 million was lost because of a typo error! In 1990, the AT&T long distance telephone network crashed for nine hours because of a software bug. It cost the company millions of dollars. The mistake was the result of a misplaced break statement. The code that was put inside a switch statement looked like the following (from Expert C Programming, Peter van der Linden, Prentice Hall PTR, 1994):

35

Find us network code() { switch (line) { case THING1: doit1(); break; case THING2: if (x == STUFF) { do_first_stuff(); if (y == OTHER_STUFF) break; do_later_stuff(); } /* coder meant to break to here... */ initialize_modes_pointer(); break; default: processing(); } /* ...but actually broke to here! */ use_modes_pointer();/* leaving the modes_pointer uninitialized */ }

175,236

AsAs you can see, the programmer has put a break; after the if condition. He actually wanted to you can see, the programmer has put a break; after if condition. He actually wanted break it outside the if condition; butbut the control gets transferred to outside the (enclosing) to break it outside the if condition; the control gets transferred to outside the (enclosing) switch statement! We all know that it is not possible to use break; to come outside of an if block: this switch statement! We all know that it is not possible to use break; to come outside of an if simple mistake resulted in a huge loss to AT&T.

Mohamed

block: this simple mistake resulted in a huge loss to AT&T.

Programmers are usually surprised at how silly mistakes such as the use of the wrong operator Programmers are termination condition for a mistakes can as the serious software problems. symbols, the wrong usually surprised at how silly loop, etc, suchlead to use of the wrong operator True, whilethe wrong termination condition for a loop, etc, can lead to errors could sometimes lead symbols, most such mistakes will not cause any harm, some minor serious software to major disasters.

Reza

Facebo

problems. True, while most such mistakes will not cause any harm, some minor errors could

sometimes lead References to major disasters.


1. Collection of Software Bugs 2. Expert C Programming, Peter van der Linden, Prentice Hall PTR, 1994

Complete

@Linu

Li G

22

Related Posts:
Typo Bugs Joy of Programming: How to Detect Integer Overflow Joy of Programming: Fail Fast! CodeSport (March 2011) Joy of Programming: About the Java Overflow Bug
Tags: AT&T, bugs, mistakes, programming

Li P

16

Li W

14

Li U

13

Li A

13

Article written by:


S.G. Ganesh
36

M T

10

M D

The author works for Siemens (Corporate Technology). He is the author of the bestseller 'Cracking the C, C++ and Java Interview' published by Tata McGraw-Hill.

Write For Us

About the Java Overflow Bug


Submit Tips Subscribe to Print Edition

MEN AT WORK! Magazine archives prior to April 2011 are either missing or aren't correctly formatted. W

In this column, well discuss a common overflow bug in JDK, which surprisingly occurs in the widely used algorithms like binary search and mergesort in C-based languages.
HOME REVIEWS HOW-TOS CODING INTERVIEWS FEATURES OVERVIEW BLOGS

Joy of Programming: About the Java Overflow Bug


By S.G. Ganesh on February 1, 2009 in Coding, Columns 0 Comments

In this column, well discuss a common overflow bug in JDK, which surprisingly occurs in the widely used algorithms like binary search and mergesort in C-based How does one calculate the average of two integers, say i and j? Trivial you would say: it is (i + languages.

j) / 2. Mathematically, thats correct, but it can overflow when i and j are either very large or very small when using fixed-width integers in C-based languages (like Java). Many other

How does one calculate the average of two integers, say i and j? Trivial you would say: it is (i + j) / languages like Lisp and Python do not have this problem. Avoiding overflow when using 2. Mathematically, thats correct, but it can overflow when i and j are either very large or very small when using fixed-width integers in C-based languages bugsJava). Many other this problem. Lisp fixed-width integers is important, and many subtle (like occur because of languages like and Python do not have this problem. Avoiding overflow when using fixed-width integers is important, and many post, Joshua Bloch (Java expert and author of books on Java intricacies) In his popular blog subtle bugs occur because of this problem.

writes about how a bug in binarySearch and expert and author of books on Java his code in In his popular blog post [1], Joshua Bloch (JavamergeSort algorithms was found inintricacies) writes about how a bug [2] in binarySearch and mergeSort algorithms was found in his code in java.util.Arrays class in JDK. It read as follows: java.util.Arrays class in JDK. It read as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public static int binarySearch(int[] a, int key) { int low = 0; int high = a.length - 1; while (low <= high) { int mid = (low + high) / 2; int midVal = a[mid]; if (midVal < key) low = mid + 1 else if (midVal > key) high = mid - 1; else return mid; // key found // key not found.

Get Connected RSS Feed Facebook

} return -(low + 1);

The bug is in line 6int mid = (low + high) / 2;. For large values of low and high, the expressionis in line 6intbecomes a negative number (since low and high represent array The bug overflows and mid = (low + high) / 2;. For large values of low and high, the indexes, they cannot be negative).

expression overflows and becomes a negative number (since low and high represent array

However, this bug is not really newrather, it is usually not noticed. For example, the classic K & indexes, they cannot be negative). R book [3] on C has the same code (pg 52). For pointers, the expression (low + mid) / 2 is wrong 37 and will result in compiler error, since it is not possible to add two pointers. So, the books solution is to use subtraction (pg 113):

The bug is in line 6int mid = (low + high) / 2;. For large values of low and high, th The bug is overflows and becomes negative number (since low and high represent expressionin line 6int mid = (lowa+ high) / 2;. For large values of low and high, the array expression overflows and becomes a negative number (since low and high represent array indexes, they cannot be negative).
indexes, they cannot be negative).

However, this bug is not really newrather, it is usually not noticed. For example, the classic However, this bug is not really newrather, it is usually not noticed. For example, the classic K However, this bugC Programming Language, Brian usually not noticed. For example, the classic K & K & R book [The is not really newrather, it is W. Kernighan, Dennis M. Ritchie, R book [3] on C has the same code (pg (pg 52).pointers, the expression (low + mid) / 2 is wrong2 is wro R Prentice-Hall, C hason C same code code (pg 52). For pointers, the expression (low + + mid) / book [3] on 1988.] the has the same 52). For For pointers, the expression (low and will result in compiler error, since it is it ispossible to addto add two pointers. So, the books sol and will result in compiler error, since not not possible two pointers. So, the books solution mid) / 2 is wrong and will result in compiler error, since it is not possible to add two pointers. is to use subtraction (pg 113): is to use subtraction (pg 113): So, the books solution is to use subtraction (pg 113):
mid = low mid = low ++(high-low) / 2 2 (high-low) /

This finds mid when high and low are ofof the same sign (they pointers, they can never never be and low are the same sign (they are are pointers, they can This finds mid when This finds mid when high and low are of the same sign (they are pointers, they can never be negative). This is also a solution for the overflow problem we discussed onon Java. be negative). This is also a solution for the overflow problem we discussed Java. negative). This is also a solution for the overflow problem we discussed on Java.

IsIs there anyother way to fix the problem? If low andand high converted to unsigned values values and there any other way to fix the problem? If low high are are converted to unsigned Is there any other way notfix the problem? If low and high are converted to unsigned values a then divided by 2, it will to overflow, as in:

and then divided by 2, it will not overflow, as in: then divided by 2, it will not overflow, as in:

int mid = ( (unsigned int) low + (unsigned int) high) / 2;

int mid = ( (unsigned int) low + (unsigned int) high) / 2;

But Java does not support unsigned numbers. Still, Java has an right shift right shift But Java does not support unsigned numbers. Still, Java has an unsignedunsigned operator operator (>>>) (>>>)it fills the right-most shifted bits with (positive values remain as positive numbers; also known it fills the right-most shifted bits with 00(positivevalues remain as positive numbers;
asalso known as value preserving). For the Java right shift operatorthe sign of the filled bit is the value of value preserving). For the Java right shift operator >>, >>, the sign of the filled bit the sign bit (negative bit (negative values remain negative and positive values remain is the value of the sign values remain negative and positive values remain positive; also known as sign-preserving). Just as an aside for C/C++an aside for C/C++ programmers: C/C++ operator and it positive; also known as sign-preserving). Just as programmers: C/C++ has only the >> can be sign or value preserving, depending on implementation. So we can use the >>> operator in has only the >> operator and it can be sign or value preserving, depending on Java: implementation. So we can use the >>> operator in Java:
int mid = (low + high) >>> 1;

The result of (low + high), when treated as unsigned values and right-shifted by 1, does not The result of (low + high), when treated as unsigned values and right-shifted by 1, does not overflow! overflow! Interestingly, there is another nice trick to finding the average of two numbers: (i Interestingly, thereis another nice trick to finding the average of two numbers: (i & j) + (i ^ & j) + (i ^ j) /2. This expression lookslooks strange, doesnt it? How dowe get this expression? Hint: It is It is based on a this expression? Hint: j) /2. This expression strange, doesnt it? How do we well-knownwell-known Boolean equality, for example, as noted here: (A AND B) +(A OR B) = A + B = (A based on a Boolean equality, for example, as noted in [4]: (A AND B) + (A OR B) XOR B) + 2 (A AND + 2 (A AND B). = A + B = (A XOR B) B).
A A relatedquestion:How do you detect overflow whenwhen adding two ints? Its a very interesting topic related question: How do you detect overflow adding two ints? Its a very interesting and is the subject for next months column. topic and is the subject for another column.

References
1. googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html 2. bugs.sun.com/bugdatabase/view_bug.do?bug_id=5045582 3. The C Programming Language, Brian W. Kernighan, Dennis M. Ritchie, Prentice-Hall, 1988.
38

How Debugging Can Result in Bugs!


We typically debug code to find and fix bugs. However, debugging itself can cause bugs. This is an interesting phenomenon that we cover in this months column.

Consider the following simple Java example, which I recently encountered: Logger.log ("Skipping undefined data type: " + dataType.getCategory().toString( )); This is debugging code to print a log message that was meant for tracking what was going wrong in the program. This program crashed with a NullPointerException because the code did not check for null before accessing dataType or getCategory(). In other words, the very code that was meant for debugging, introduced a bug! This prompted me to write about the topic.

39

It is a good practice to save debug information and trace information in log files. When the application crashes at the customers site, if the log files contain all the relevant information, it is possible to trace the cause of the problem by just analysing the log files. However, a major problem in logging/tracing messages is that they generate huge amounts of data (sometimes, in the order of a few GBs!) and it is easy to get lost in the details. A practical approach is to introduce multiple trace levels, which is useful for troubleshooting and debugging. In case of multiple threads, there needs to be some way of matching trace messages originating from a given thread of control (and process). Otherwise, if no such identification and time-stamping of trace messages is available, it will be difficult to make use of an exceptionally large trace file. For this reason, some projects develop custom scripts/tools to process the log files and report the problems! Also note that log files need to be removed periodically, if they grow beyond the allowed size. I know about an application that used to crash often because the log files became so huge that no more data could be written to them. Sometimes, when debugging code is added to a system to understand how it is working, the debug code can introduce new bugs. For example, for diagnostic purposes, test probes can be added to get intermediate values at fixed locations in code. This process of introducing test probes can also bring with it subtle timing errors, particularly in the code of embedded systems, where response time is critical. In other words, the very process of examining the system, can alter it! Debug code can also introduce security issues. In 1988, when the Internet was in the early stages of its development, a worm affected around 5 per cent of computers connected to the Internet. The worm affected only Sun and VAX machines. It collected host, network and user information, based on which it broke into other machines. The affected machines were overloaded with unknown processes, and killing the processes did not help. Rebooting also didnt solve the problems. It was later found that the worm exploited three different vulnerabilities in Unix systems: a buffer overrun vulnerability in fingerd, the debug mode of the Sendmail program, and accounts with weak (or no) passwords. Our interest here is in the attack on Sendmail that exploited debug code. The worm would send a DEBUG command to Sendmail and fork a shell (sh); it would use that shell to download and compile new worms, and thus spread the worm.

40

Why did Sendmail allow DEBUG code? It was provided to allow testers to verify that mail was arriving at a particular site, without the need to invoke the address-resolution routines! For more details about this worm, see the article, Crisis and Aftermath by E H Spafford, Communications of the ACM, June 1989.

41

SNAFUSituation Normal, All Fouled Up!


The stories of software development projects in crisis are amazingly familiar to all experienced programmers and managers. In this column, well see which aspects of projects in crisis are strikingly similar and how they relate to bugs.

A software product is inseparable from underlying software process which resulted in creating it. Though bugs are technical in nature, it is the software development process that has the most impact on resulting in bugs. To illustrate this, see what happens in a new software project when Raphus cucullatus, nicknamed Dodo was given the responsibility of managing the project. All is well: Dodo kick starts the new project with a team of experienced developers. Dodo thinks that software can be produced under tight deadlines and creates a project plan. The customer is satisfied with the plan. Monitoring based on effort spent: Dodo monitors the project by effort spent than the progress made. Customer is happy to see the considerable amount of effort spent in the project. Focus on functionality completion: Dodo feels quite happy to closely follow the functionality completion. He accepts all features that the customer requests. Customer is glad that the software has a good set of features.
42

NFRs become a problem: However, during the last few months when approaching milestones, and integration and system testing are done, all kinds of non-functional issues (performance, stability, etc) arise and bug counts shoot up. Dodo tells the customers that everything is normal and the software will be shipped in time. Increasing pressure: Dodo is unhappy with the raising bug counts and increases pressure on the development and testing teams. However, no progress seems to happen, and bug counts keep going up no matter how many bugs are fixed. Adding more people: When it becomes doubtful of meeting the release deadline, he tells the customer that the team is facing a few minor issues. He pacifies the customers saying that more programmers are added to the project. The situation only worsens. Process what process? When the house is burning, there is no place for etiquettes and manners, so he removes unnecessary progress bottlenecks and niceties such as peerreviews and impact analysis for change requests. He is a practical man and focuses on shipping the software within the sacred milestone date. Delay is inevitable: When it is clear that the software is unstable and milestone dates cannot be met, he tells the customer that the software will be ready for release after a delay of few months. Customer is very unhappy. Ship on an as-is basis: When it dawns to him that he cannot postpone the release anymore, he tells the customer that there are minor glitches and the software is ready for release. Customer is frustrated with the quality of the shipped software and escalates the issue to the management. Fire the programmers: Management forms a task force to conduct a root-cause-analysis for the whole problem. The team makes clear findings that it is the poor quality of technical people in the team that is the cause of all the problems (to quote from the report quality of software is as good as the quality of its people). The team further suggests adopting agile methods and using latest technologies to avoid repeating this situation in future. Management fires the architect and team leads and key developers; further, a decision is taken to adopt agile methodology for all new projects and suggests using only Java and .NET technologies in future. Promotion: Dodo has gained experience in managing software projects and management finds that he has special skills in crisis management! He gets promoted and assigned a bigger team with a new and larger software project to manage.
43

It starts all over again: Ever optimistic, Dodo prepares a reasonable estimate based on his past experiences in managing projects This story is not to disrespect any managers. Also, I am not cynical: what I described is reality. Optimistic estimation of effort for software development and poor project management are the two most important factors that results in buggy software (poor craftsmanship is another important cause, but thats a topic for another column). To put it simply, when software is developed under undue pressure and getting the work done becomes the priority, quality is naturally compromised and that is the major cause for software bugs. Errata: Bugs seem to be part of my life, including a bug in my January column article! The range of the Java byte data type was wrongly mentioned as -127 to +128 it is obviously -128 to 127; so the value of variable b3 in that article is -128 and not -127. I thank Krishna M for noting and sharing the silly mistake I made.

44

The Legacy of C
Dennis Ritchie died on the 8th of October 2011, at the age of 70. His lasting contributions to computing include creating the C language, and co-creating the UNIX operating system. In his memory, lets reflect on the unique aspects of the C programming language in this column.

I started learning C in 1996, and have had fun programming in it for many years now. It was also the first programming language for most of my classmates. Most students today learn languages like Java no doubt its a safer language to program, and hence a good option to start with but I think they miss the fun of programming for the machine. For example, I remember writing a C program that switched on the keyboard Caps Lock (without pressing the actual key). More fun was graphics programming by writing directly to video memory, and the fancy things I could do by creating icons and windows (in the old days of DOS) by switching a matrix of pixels on and off, as needed.

45

I hope this article will motivate students to learn and explore the joys of C. This article is also to remember the contributions of Dennis Ritchie, by re-looking at the C programming language. C is not a perfect language, and writing programs in C is often like walking (or running) on a slippery slope. As Dennis himself commented, C is quirky, flawed, and an enormous success. C is quirky; take, for instance, the way arrays, strings and pointers are related, and how this relationship can be exploited. As an example: while(*t++ = *s++); Given that s is the source string to be copied and t is the destination, this while loop copies the string from s to t. This curt code is possible because of the following: strings are implemented as an array of characters, and the starting of a string is an address (pointer). We can traverse an array by starting from the base of the array, and perform pointer arithmetic to access the elements. In this code, as long as the characters from the source are non-NULL characters, the truth value in the while loop is non-zero (which is considered true), and hence the characters will be copied to the destination. When the source character value in the string is \0 or NULL, the while condition will be zero, and hence the loop will terminate. The result is that it copies the string from source to destination. Of course, lots of things can go wrong in code like this. Here, in the expression *s++, it is difficult to find out which operator has higher precedence is it dereference (*) or is it postfix increment (++)? If you look at the large operator precedence table, youll find that postfix increment (++) has higher precedence than dereference (*), and hence s++ is executed first, followed by *. However, because ++ is postfix here, s++ is not effective till the end-of-the-statement (or more technically, the sequence point), and hence *s++ will be the value of the current character of the string to which s points. Also, from *s++, it is not clear if the ++ applies to the underlying location in the string, or the character in the string. Since ++ is applied first, it applies to the address in the underlying string, which has the effect of changing the address to point to the next character.

46

Further, in the while loop, we purposefully use = instead of == (to assign the character). As you know, this behaviour is prone to bugs; in fact, mistyping = instead of == is one of the most common sources of bugs in C. Similarly, there are many other quirks. Consider break and continue statements, for example. The break statement can be used within switch statements or the body of loops (while, for, and do-while). However, the continue statement can be used only within the body of loops, and not within switch statements. Thats a quirk. By default, if we forget to use a break statement, control will fall through to the next statement. If you think about it, it makes sense to use continue also it could direct the control flow to continue to the next case statement, instead of having the default behaviour being to fall through to the next statement. In this way, it could have also prevented countless bugs caused by forgetting break statements within switch statements. Because of quirks like this, C is perhaps one of the very few programming languages in which a book has been written on its traps and pitfalls C Traps and Pitfalls, Andrew Koenig, Addison Wesley, 1989. C is also flawed in many ways. For example, consider the following statement: if(variable & BIT_FLAG != 0) What we are perhaps trying to do here is to check if the variable has the BIT_FLAG set on or not. However, the expression would be treated as if( variable & (BIT_FLAG != 0) ) and not as if( (variable & BIT_FLAG) != 0 ). Why is this? Because the operator precedence of relational equal operators (== and !=) is higher than bitwise operators (such as &, |, and ^). However, other bitwise operators, such as >> and << are of higher precedence than relational equal operators (which is correct). Then why this mistake? An old mail from Dennis Ritchie explains how this happened: From decvax!harpo!npoiv!alice!research!dmr Fri Oct 22 01:04:10 1982 Subject: Operator precedence Newsgroups: net.lang.c

47

The priorities of && || vs. == etc. came about in the following way.Early C had no separate operators for & and && or | and ||. (Got that?) Instead it used the notion (inherited from B and BCPL) of truth-value context: where a Boolean value was expected, after if and while and so forth, the & and | operators were interpreted as && and || are now; in ordinary expressions, the bitwise interpretations were used. It worked out pretty well, but was hard to explain. (There was the notion of top-level operators in a truth-value context.) The precedence of & and | were as they are now. Primarily at the urging of Alan Snyder, the && and || operators were added. This successfully separated the concepts of bitwise operations and short-circuit Boolean evaluation. However, I had cold feet about the precedence problems. For example, there were lots of programs with things like if (a==b & c==d) In retrospect, it would have been better to go ahead and change the precedence of & to higher than ==, but it seemed safer just to split & and && without moving & past an existing operator. (After all, we had several hundred kilobytes of source code, and maybe 3 installations.) However, despite all its quirks and flaws, C is one of the most widely used languages today! If you look at the TIOBE programming community index for October 2011 (see Table 1) what is remarkable is that C still holds the second position. Whats more, its gaining popularity. Table 1: TIOBE Programming Community Index (October 2011)
Table 1: TIOBE Programming Community Index (October 2011) Position 1 2 3 4 5 6 7 8 9 10 Programming Language Java C C++ PHP C# Objective-C (Visual) Basic Python Perl JavaScript Ratings 17.913% 17.707% 9.072% 6.818% 6.723% 6.245% 4.549% 3.944% 2.423% 2.191%

January 30, 2012 ! 5 Comm

Cyber Attacks Explaine

November 1, 2011 ! 4 Comm

Learn to Animate with Modelling the Goldfish

By November (when this article appears), it willC#, and Objective-C are heavily influenced by, we perhaps be in the top position! Though other popular languages in this list Java, C++,
and are direct or much descendants from C (though popularity and Simula and is still should not assign too indirect importance to languageother languages like ratings, itSmalltalk have more influence on these languages when it comes to OO). This influence is obvious in the form of basic data types, operators, keywords, syntax (such as using curly braces for blocks), etc., 48 in these languages. What is not obvious is the influence of C on various other aspects, such as semantics and pragmatics, in these languages. For example, one of the first few languages to separate I/O

By November (when this article appears), it will perhaps be in the top position! Though we should not assign too much importance to language popularity and ratings, it is still noteworthy that C continues to be one of the most widely used in the world today. Also, what is remarkable is that the

noteworthy that C continues to be one of the most widely used in the world today. Also, what is remarkable is that the other popular languages in this list Java, C++, C#, and Objective-C are heavily influenced by, and are direct or indirect descendants from C (though other languages like Simula and Smalltalk have more influence on these languages when it comes to OO). This influence is obvious in the form of basic data types, operators, keywords, syntax (such as using curly braces for blocks), etc., in these languages. What is not obvious is the influence of C on various other aspects, such as semantics and pragmatics, in these languages. For example, one of the first few languages to separate I/O functions from the core language and move it into a supporting library was C; other languages we have listed here follow this tradition. The combined popularity of these five languages in this table is more than 50 per cent, which is a staggering number! C is clearly not the cleanest language ever designed, nor the easiest to use, so why do many people use it? This is the question that you might find yourself asking. Here is Stroustrups answer to it: It is flexible [to apply to any programming area] It is efficient [due to low-level semantics of the language] It is available [due to availability of C compilers in essentially every platform] It is portable [can be executed on multiple platforms, even though the language has many non-portable features] Lets discuss each of these points now. C is a powerful and flexible systems programming language. Though it was originally designed for writing the UNIX OS, it is today used for a wide variety of system programming, such as database management systems, compilers and virtual machines, Web servers, textprocessing systems, telephone-switching systems, etc. All this is because of its flexibility in various ways. For example, C is not strongly typed. These days, many of the new languages are strongly typed. Strong typing allows one to catch mistakes (related to data-type usage) early, and hence helps develop more robust code. For example, in K&R C (i.e., before standardisation), we can do implicit conversions between pointers and integers, which is buggy. However, it has been disallowed now, but it still has its uses. Today, most C compilers do stronger type-checking, and warn of potential mistakes. Still, C allows us to override these checks. The ability to write type-unsafe code is useful and important because it is often required in low-level programming tasks such as writing device drivers.

49

C is so flexible in its syntax that it is possible to misuse its flexibility for example, to write obfuscated code in it; in fact, since 1984, there is a yearly contest for writing obfuscated C code The International Obfuscated C Code Contest! C is also efficient. For example, unlike most other languages today, there is no C runtime. At runtime of a C program, what exist are the memory-management routines, etc., but there is no sophisticated runtime support. Comparing the C runtime with the Java runtime would be wrong, since Java is meant for application programming but this comparison is just to understand what we mean by almost no runtime support. JVM is a sophisticated runtime, which performs various tasks such as checking the validity of the bytecode to execute (see if the code is well-formed, if the program is safe to execute, etc.), load and unload the bytecode from Java class files as needed, support GC, check the instructions before executing them and to throw an exception if needed (such as divide-byzero, out-of-bounds access, etc.) Cs syntax and semantics is so close to the machine that it is often considered a low-level programming language, almost an abstraction over the assembly code. Todays enterprisequality optimising C compilers can generate code as fast as assembly. Further, the size of the code generated by C also tends to be (arguably) small in size. It can be debated that some of Cs features help in that. Consider the example of the postfix ++ operator. Ken Thompson, while implementing the B compiler, noticed that increment operations generate more compact code than adding 1 to a variable and storing it back. In other words, an increment was one instruction, while adding 1 and storing it back was more than one instruction. So he implemented the prefix ++ (and -) operators, and generalised it by adding a postfix version. This tradition of prefix and postfix operators continues today in languages like Java. C is also portable. This statement might seem surprising in the context of VM languages such as Java, which are popular today because they work without change (well, experience indicates, it is mostly without change) on different platforms. C is also available on almost all platforms today. The original C compiler written by Dennis Ritchie had numerous dependencies on the features of PDP-11 (an early machine for which C was first implemented). Around 1978, Steve Johnson wrote a portable compiler for C (The Portable C Compiler, or pcc for short now available under the BSD Licenses). It made the task of porting the compiler to various other machines and platforms easy. This helped spread C fast, together with the fast growth of UNIX installations (C was the primary language for UNIX machines). It is used in embedded devices, such as microwave
50

ovens, super computers, etc. However, given the fact that C is a low-level language, its portability is surprising. Consider pointer arithmetic as an example for portability. Given that a pointer is an abstraction of a (machine) address, one would logically assume that pointer arithmetic would require knowing the size of the data types on which the pointer points to. However, since the data type is encoded in the pointer type, the compiler automatically calculates the sizes required for pointer arithmetic. For this reason, the pointer arithmetic, though low-level, is portable, since the size of the data types is abstracted. Of course, the size of the data type itself differs from machine to machine. For example, the size of the int data type is implementation-dependent. However, this implementation dependency aids in efficiency: the compiler can use the native size of the data type on that machine, and hence produce faster code. If the size of int is fixed in the language, say to 4 bytes, then it will be difficult (or even impossible) to port it to tiny machines; further, even if it is possible, such hard-coding would make such compiled programs comparatively inefficient. Now, consider another example for portability: using floating-point types in the switch statement. In C, we can use only integral types in switch-case statements, and not floating point types. This is because, if it were allowed, it would require direct comparison of floatingpoint numbers, which is not portable. To explain this, the implementation of floating-point numbers can differ across machines (though these days the IEEE 754 standard for floating-point arithmetic is almost universally followed). Many real numbers cannot be accurately represented in the floating-point format; for example, the common real number 0.1. Hence, comparing floating-point numbers directly in equality checks can result in wrong results and that too, different results for platforms with different implementations of floating-point numbers. If switch statements were to allow floating-point numbers, then the comparison of the switch condition variable value with case statements value would require direct comparison, the results of which would vary across platforms. Hence, floating-point values are not allowed in switch statements in C! (Though some modern languages allow it, they also mandate that the implementations must follow the IEEE 754 standard, and hence it is not much of a problem in those languages.) To get an idea of how important Dennis Ritchies influence and contributions are, even today, just find out the name of the highest-selling book in computer programming. As on 14th October 2011 (when this article was written), the book, The C Programming Language by Brian Kernighan and Dennis Ritchie ranks No. 225 overall, in Amazons best-selling list, and is the top-selling book in the Book -> Computers & Internet -> Programming category!
51

The book has an example of printing hello world to the console, which is something that has become a tradition followed by most tutorial books on programming languages. To summarise, C is an interesting language to learn, and is fun to work with. It is also a small language, and behind its veil of simplicity lies power it just requires many years of experience to understand and appreciate this fact.

52

The Technology Behind Static Analysis Tools


There are a wide range of static analysers available today both commercial as well as open source. Have you ever wondered how static analysers magically detect difficult-to-find bugs in code? And why some commercial static analysers are extremely costly? Have you ever thought how difficult (or how easy) it would be to write your own static analyser?

To answer these questions, we need to understand the technology behind static analysers. In this column, I will first provide a detailed overview of static analysers and then delve into the different analysis techniques, many of which are formal, i.e., applied mathematics for modeling and analysing computer systems (hardware or software). For each technique, I will also mention widely adopted (open source or commercial) tools that use the specific technology; and highlight the possibilities of implementing your own static analysers (i.e., ideas for your six-month academic project).
53

Analysing programs to gather facts on them is known as program analysis. This can be performed dynamically, i.e., by actually executing the program and gathering facts. For example, when you test a program, you are performing dynamic program analysis, where you check if the program leaks memory, or fails with a null-pointer access exception. Program analysis can also be performed statically, i.e., without actually executing the program, and gathering facts. For example, when you review code, you dont actually execute the program you just analyse the program in your mind, and find bugs in the program, such as nullpointer access. So what are static analysers? They are tools that analyse the program without actually executing it. Static program analysis can be performed for a wide variety of reasons. However, two main applications of program analysis are to optimise code, and to find bugs. A compiler optimiser analyses programs to understand how it can generate more efficient code. Bugdetection tools analyse programs to see if there are any mistakes in the program, such as buffer overflows, that can lead to runtime errors. Our focus is on static analysis to find bugs. Before we go ahead, note that static analysers can be used for purposes other than finding bugs: to find instances of the use of design patterns, to find duplicate code segments (code clones), to report metrics (measurement results such as Depth of Inheritance Tree) results, for code comprehension and reverse engineering (generate documentation or higher-level design diagrams to aid understanding the program), etc. Also note that static analysis can be performed on different software artifacts, such as design diagrams (e.g., UML diagrams), structured contents (e.g., XML files), grammars (e.g., yacc programs), etc. Further, the input to static analysers need not be just source code; it can also be byte code (as in Java/C#) or executable code (e.g., native code generated from C/C++ programs). Here, we mainly focus on techniques to find bugs, i.e., code or design problems, from code.

An overview of technologies
Static analysers are implemented using a wide range of technologies. Ill describe them starting from the simple ones to the more complex. A warning before we proceed: topics such as theorem proving and abstract interpretation are quite technically complex, so I will present the overall idea behind the technique and leave it to you to explore the concepts further and figure them out. Bug pattern matching Some bugs are easy to find, even without the use of sophisticated technologies. For example, it is a common mistake in C/C++ to type = instead of == in condition checks. We can easily
54

detect this bug pattern by checking if the = operator is used for condition checks. This is typically performed by matching the coding pattern in the program with the expected bug pattern at the AST (Abstract Syntax Tree) level, as in the case of the classic, free lint program in UNIX (today, Gimpel sells better lints under the name PC-Lint for Windows and Flexlint for UNIX flavors; see gimpel.com). FxCop is a free C# tool from Microsoft that matches bug patterns at byte code level. One main advantage with bug pattern-matching is that the tool can execute quite fast even on a large code base, and even on partially written programs (i.e., work-in-progress code with syntax or semantic errors that wont compile successfully). The main disadvantage of bug pattern-matchers is that they are not effective in finding useful runtime errors such as nullreference or divide-by-zero errors. Since their analysis is shallow, they report wrong or false errors, technically referred to as false positives. Most bug pattern matching tools provide support for extending the tool. For example, FxCop has a documented API, and you can write your own (i.e., custom) rules using the API. The Eclipse IDE supports JDT and CDT for Java and C/C++, respectively. JDT/CDTs ASTs are exposed as APIs. If you learn the AST and the API, you can write a bug detector as a summer project. Since lint is perhaps the earliest of static analysers, even today, when people refer to static analysis tools, they have a lint-like tool in mind. However, today there are sophisticated and different technologies used to find bugs, as well see later in this article. Data-flow analysis In data flow analysis (DFA), the runtime information about the data in programs is collected. This analysis is typically performed by traversing over the control-flow graph (CFG) of the program. Now, what is a CFG? It can be thought of as an abstract representation of functions in a program, in a graph. Each node in the graph represents a basic block, and directed edges are used to represent jumps in the control flow. Now what is a basic block? It is a sequence of statements where the control enters at the beginning, leaves only at the end, and the control cannot halt or branch out of the block (except, of course, at the end). Now, DFA can be performed to find bugs such as null-pointer access. From the point where the pointer variable is initialised, and where it is de-referenced, we can find out the path(s) in which the value of the pointer variable is still null when it is de-referenced. DFA can be intraprocedural or inter-procedural, i.e., the analysis can be limited only to within a function or to the whole program. The analysis is typically performed by using standard algorithms, and they are not computationally intensive. However, analysing the whole program is costly in terms of
55

processing time and/or the memory space required. Hence, many static analysers limit themselves to intra-procedural analysis. For example, FindBugs is an open source tool that performs bug pattern matching for simple problems, and performs DFA to detect problems such as null-pointer access at the intra-procedural level. DFA is mainly used by compiler optimisers to generate efficient code. DFA does not gather much useful information based on the semantics of the programming languages and their operators. Hence, it is useful for finding bugs, but is still not very effective. Abstract interpretation Abstract interpretation is approximating the program semantics by replacing the concrete domain of computation and their operations to an abstract domain of computing and their operations. I know this description is confusing; so, let me explain abstract interpretation with a standard introductory example about rules-of-sign that we learnt in school. Consider the expression (-123*456). What is the sign of this expression? Without actually calculating the resulting value, we can say that the expression results in a negative value. How? We know the rules-of-sign: multiplying a negative value with a positive value results in a negative value. In other words, the expression can be abstractly interpreted as (negativevalue * positive-value) => negative-value. If we actually perform the arithmetic to find the sign, we will be performing concrete interpretation; if we abstract them and perform arithmetic to find the sign, we are performing abstract interpretation. Now, how is it useful for finding bugs? Consider a simple C example: float f1 = -4; float f2 = 4; printf("%lf", sqrt(f1 * f2)); We need not have to actually evaluate (concretely interpret) the expression f1 * f2 to find that it results in a negative valueand it is an invalid arithmetic operation to try to get the square root of a negative number; we can reach the same conclusion if we abstractly interpret the expression. There are many commercial tools that use abstract interpretation to find bugs for example, Polyspace from Mathworks. Abstract interpretation is computationally very expensive, and choosing an appropriate abstract value domain and heuristics for determining termination are important to make it practically usable on large code-bases. Most commercial tools that use this technology are also costly.
56

Symbolic execution is analysing the program by tracking the symbolic values instead of actual values. In a way, symbolic execution (or analysis, or evaluation) is abstract interpretation; in fact, every kind of deeper analysis without executing the program can be seen as abstract interpretation! Model checking Program execution can be viewed as the transition of the program from one state to another state. Most states are valid, and some are error states. Examples of error states are the program states when divide-by-zero, deadlock or buffer-overflow happens. Model checking is an automated verification technique, where the possible system behaviour (i.e., implementation behaviour) is matched with the desired behaviour (specified properties). In other words, a model checker ideally checks all possible system states and verifies if the given properties hold. If the property does not hold for a certain reachable state, then the property is violated, and a counter example is thrown to the user about the violation. Java PathFinder (JPF) is an open source tool that explicitly constructs state models to check the software. In practice, exhaustive checking of all system states is not feasible for commercial software, which often consists of millions of lines of code. In other words, if the transition system representing the program has too many states, then it becomes very difficult to check the system against the properties; this is known as a state-explosion problem. Many techniques are being developed to address this problem, and already some solutions are being widely used. One is to construct only part of the state space of the program, and as state transitions are explored, more states are built, as the need arises. Another approach is to use symbolic checking. In this approach, the states and transitions are implicitly represented using Boolean formulas, known as Binary Decision Diagrams (BDDs). Now the solvers that work on BDDs can be used, and this simplification considerably pushes the limits on the size of the programs that can be checked. For example, the SLAM/SDV tool from Microsoft automatically creates Boolean program abstraction from a C program, and model checking is applied on the resulting Boolean program. The SDV tool is shipped with the Windows Driver Development Kit (WDK). Model checking can find defects that are generally hard-to-detect using conventional techniques like testing. Also, model checking can be used for sequential as well as concurrent programs (as we know, concurrent programs can have bugs that are non-deterministic, and hence model checking is very useful). Program querying

57

The fundamental idea behind program querying is that a program can be viewed as structured data (in other words, a database), and we can perform queries on it to get the necessary information. In implementing this idea, the program can be implicitly treated as a database or it can explicitly use a database such as MySQL. Similarly, for querying the data, one can use an SQL-like language, or SQL itself. For example, NDepend is a tool that generates code and design metrics. With its Code Query Language (CQL), we can write SQL-like queries to obtain data on the program. A list of query languages is provided at cs.nyu.edu. Logic programming languages such as Prolog use a database of facts, and queries in these languages allow for inferring relationships from the given set of facts. For this reason, Prolog and its variants, such as Datalog, are widely used for static analysis, particularly for inferring design patterns or anti-patterns. For example, the JTransformer tool translates a Java program to Prolog facts. Now, its possible to implement tools such as Cultivate, which use the Prolog facts generated by JTransformer, and infer design metrics as well as violations. Static analysis is a useful and cost-effective way to find bugs early in the software development life-cycle, and complements other approaches such as testing. In this article, I have outlined different technologies for static analysis, and hope youll appreciate the fact that the technologies applied for static analysis are advanced. If youre a student, youll find the writing tools to automatically find bugs interesting and challenging; you can start learning about the technology that interests you, and implement your own detector. Many of the widely used and free tools, such as PMD, FindBugs, CheckStyle, FxCop and StyleCop, provide extensibility facilities to write your own rules; so you can also learn by implementing new rules and testing them, before starting to write a full-fledged bug detector.

58

The Broken Window Theory


It is common to see software projects fail. One important cause is design and code rot. In this article, lets try understanding the causes, in the light of a popular theory.

Development projects are often completed within tight deadlines to deliver working software, and hence managers focus only on externally visible product quality aspects, such as reliability, stability, performance, security, etc. Other less visible or not immediately measurable aspects, such as maintainability or reusability, are generally ignored. Most projects fail because they do not meet customer requirements (typically, non-functional requirements such as reliability); if the project survives, it moves on to the maintenance phase. During maintenance, changes are made to fix or enhance features in the software without much focus on improving design or code quality. If such changes continue to be made, the design and code starts decaying the visible symptoms are known as code smells in the refactoring community. If efforts to take up refactoring activities are not made, it reaches a situation in which developers dread to touch the code. First, it becomes extremely difficult to understand the design and code; so any attempts to make even minor changes to the code could break the software! When the software becomes fragile, managers and customers wake up and try to do something to get the situation
59

under control. However, at this point, it is often too late to address the problem, and hence the project gets scrapped. Soon, someone decides to use some other software, or write new software from scratch. For large enterprise software, the effort required for such re-engineering activities often costs millions of dollars. Why does software decay happen so quickly? Why dont developers follow good programming practices to keep the design and code clean? One way to explain this phenomenon is through the broken window theory. The broken window theory was first introduced by Wilson and Kelling in 1982: Consider a building with a few broken windows. If the windows are not repaired, the tendency is for vandals to break a few more windows. Eventually, they may even break into the building, and if its unoccupied, perhaps become squatters or light fires inside. In India, this theory is easy to explain using the traffic jams that happen so very often. When a few vehicles break the rules and create confusion in the absence of traffic police, others, too, break the rules and make their own way through the traffic, which quickly leads to chaos! In a software project, developers often do notice that the existing design and code is not clean, yet managers and leaders focus on getting the work done as soon as possible rather than on getting it right. Given the fact that the programming best practices have already been abandoned, there is no reluctance in breaking more rules, particularly when no one notices. This quickly leads to chaos, and the software becomes fragile, leading to scrapping the project. A successful approach to the broken window problem is to address the situation when things are under control, and when the problem is small. This is especially true for software. Things can go out of control very quickly in software. It is easier to do small refactorings with every fix or enhancement, than to get approvals for refactoring activities that require large budgets and a lot of time. It is understandable that managers cant usually get an approval to take up long-lead-time refactoring activities, but no one stops them from allocating a little extra time to ensure the quality of the code for every change done. Developers should be aware that breaking programming best practices is taken as a serious problem. To a great extent, these two approaches will keep the maintenance projects under control.

60

Levels of Exception Safety


The concept of exception safety is important for programming in the presence of exceptions. In this article, well look at different levels of exception safety, with the help of an example.

Lets first look at an actual code example to understand why exception safety is important. The following Java code is from Axion DB ver 1.1, from the file axion/whiteboard/one/src/ org/axiondb/util/BTree.java:

61

following Java code is from Axion DB ver 1.1, from the file
axion/whiteboard/one/src/org/axiondb/util/BTree.java:
public void read() throws IOException { // ... FileInputStream fin = new FileInputStream(_idxFile); ObjectInputStream in = new ObjectInputStream(fin); _leaf = in.readBoolean(); int size = in.readInt(); for (int i = 0; i < size; i++) { _keys.addInt(in.readInt()); _vals.addInt(in.readInt()); } in.close(); fin.close(); }

Can you find out what can this code? Here, this code? Here, the close() method Canyou find out what can go wrong ingo wrong inthe close() method will not be called if any exception is after fin and in are initialised, inside the initialised, idiom is the anyexception is thrownthrown after fin and in arefor loop. The Javainsideto for loop. T put them in finally blocks, to ensure that close() statements are called even if an exception has A method can have four levels of exception safety:

put them in finally blocks, to ensure that close() statements are called even occurred, in order to avoid resource leaks. In other words, this code has no exception safety. occurred, in order to avoid resource leaks. In other words, this code has no ex A method can have four levels of exception safety:
1. No exception safety: There is no guarantee on the effect of throwing an exception. corrupted state. 2. Basic exception safety: No resources are leaked. The operation might have caused some side-effects, but the object is in a consistent state (i.e., invariants are preserved). The state of the object might have changed. 3. Strong exception safety: No resources are leaked. The operation might either completely fail or fully succeed, but not be partially complete. In other words, commit or roll-back semantics are implemented. 4. No-throw exception safety: Operations are guaranteed to succeed, and no exceptions will be thrown. This concept is language-independent, and hence applies to languages like C++, Java and C#. Since safety here means the level of assurance given by a method, it is also known as a guarantee. No exception guarantee means that the function is really unsafe in the presence of exceptions, and that such a function can lead to resource leaks, or can corrupt the objects it manipulates. A basic exception guarantee only means that it will not leak resources, whether an exception has occurred or not. Still, a basic exception guarantee can leave objects in a partially changed (but consistent) state.
62

1.Because of the exception, resources might is no guarantee on the effect in a throwing an e No exception safety: There leak, and/or the underlying object can be of

A strong exception guarantees commit or rollback semantics (as in transaction processing in database-management systems) ensures an operation is either fully executed or not executed, but never partially executed. Hence, in practice, this behaviour is desirable for the programs that we write. It is not practically feasible to always guarantee that methods will never throw any exception, or that they will always succeed (and never fail). However, that is possible in some cases, such as an implementation of a swap method, which exchanges two variables, and will never throw an exception. Let us look at an example to understand these levels of guarantees. Assume that youre implementing an application that manipulates huge text files. One functionality it provides is removing duplicate lines in the text files. Assume that this functionality is implemented using a method named removeDuplicates, and it takes a file handle as the argument. Also assume that removeDuplicates can create a temporary file for handling large text files. Now, the exception safety levels for removeDuplicates are as follows: 1. No exception safety: While removing duplicates, the method can throw some exception. The temporary and the input files if opened might not get closed. The input file might be left corrupted, since it is partially manipulated and is not closed properly. 2. Basic exception safety: The temporary and the input files if opened would be closed. The input file might have only some of the duplicate entries removed, leaving the input text file in a partially complete state. 3. Strong exception safety: The temporary and the input files if opened would be closed. Either the input file would be left untouched, or all duplicates would have been removed. This might be implemented by initially copying the contents to a temporary file. If the duplicate removal failed in between, then the input file would remain untouched (rollback), or else, if duplicate line removal succeeded fully, then the input file would be replaced entirely with the contents of the temporary file (commit). 4. No-throw exception safety: No exceptions can get thrown out of removeDuplicates, and the function must always succeed it is not possible to implement such a removeDuplicates method that will always succeed. No exception safety is not acceptable in practice, as in this duplicate line removal example. We should at least provide basic exception safety; in this example, basic exception safety is not sufficient, but achieving even that is better than no exception safety.

63

We cannot ensure no-throw exception safety for all methods only for critical methods can it be done. Strong exception safety is the desired level for real-world applications, as illustrated in this example.

64

Bug Hunt
Every programmer knows that debugging is akin to a detectives work. In this column, well cover this bug hunt process with an example of how the Intel Pentium processor bug was discovered.

Software can fail in unexpected ways and in the least anticipated situations. In small programs, it is easy to debug, but in large software (often with more than a million lines of code), it is really difficult to debug. Bug hunts are usually enjoyable, because they are challenging. But at times the job can get frustrating, especially when debugging takes many weeks! When the bug is discovered, it is an Aha! moment joy and relief to see the mystery unravelled! Lets look at one instance of a hunt, where a bug was discovered in the hardware. What was the bug?
65

Thomas R Nicely, a mathematician, found a flaw in the floating-point division (FDIV) instruction (in the Pentium processors Floating Point Unit) in 1994. The problem was with five missing entries in the lookup table while implementing the radix-4 SRT algorithm. The bug got exposed only in rare cases. For example, the expression (824633702441.0)*(1/824633702441.0), which should equal 1, would get the value 0.999999996274709702 with the Pentium division bug. For typical or normal uses of the computer, one would probably never encounter this bug; however, for scientific computing (like numerical analysis), there were chances of facing this bug. In general, there was a very small probability of making a very big error with this bug. This Pentium bug cost Intel hundreds of millions in replacing the chips. Well look at how Nicely went about his bug hunt before confirming that the bug was in the hardware. The steps in the hunt Nicely was working on computational number theory (on prime numbers). He used a number of systems to do calculations, and then, he also added a Pentium machine. In June 1994, he found that the computed values of PI (for a large number of digits) were different from the published value. Nicely first thought that it might be a logic bug or a problem with reduced precision. He also found that the Borland compiler was giving wrong results when some compiler optimizations were enabled. Having disabled the optimizations, and after using long double (instead of double), he found some new problems. The results for some floating point calculations were different between the Pentium and other hardware. Through trial and error, and doing binary searches to locate the problematic values, he isolated the problem to two prime numbers: 824633702441 and 824633702443. He disabled the optimizations of the Borland compiler, but the error still reproduced. Then he tried disabling the FPU but made some mistakes, so the FPU did not get disabled. Hence, he thought that the bug was in the PCI bus. Finally, he purchased a Pentium machine from another manufacturer, which had a different motherboard: the bug still reproduced! When he used Power Basic instead of C, the bug was still there. Then he disabled the FPU unit, and the error disappeared. Finally, he tested the code on yet another Pentium machine, from a different manufacturer, and found the bug occurred on it. With this, Nicely was sure that the bug was in the Pentium FPU! Lessons for us
66

We can learn many things from this bug hunt: the need for a methodical approach in hunting down bugs; trying to isolate the bug one step at a time; clearly knowing how to reproduce the bug; having the tenacity to keep hunting, and never giving up; never assuming that the bug is in the application we have developed (it might be hidden beneath it) and so on.

67

Language Transition Bugs


There are subtle differences between languages like C, C++, Java and C#. Programmers transitioning from one language to another should beware of such differences.

When I speak to my Tamil friends in English, I dont really speak English, its Tanglish (Tamil + English)! For example, yes? (is it so?) and no? (is it not) become yesa and noa! In Tamil, questions are formed by adding an a suffix to words, so Tamilians find it convenient to form questions in English by adding the same suffix to English words! To someone who doesnt understand Tamil, its not just amusing; such words will be humorous too! As we can see, there are many pitfalls when we think in one language, and speak in another. Similar to the scenario of natural languages, when we have considerable experience programming in one language, and start programming in another new language, there are numerous pitfalls associated with that transition. Languages like C++, Java and C# are closely related, because they inherit a lot from C. The code segments that have similar syntax can have subtly different semantics, and transitioning between these languages can cause bugs. A very good example is the ~ symbol used for destructors in C++ and C#: the syntax and name of the feature (destructor) is the same, but
68

there is a subtle difference. In C++, destructors are deterministic (the destructor is called immediately after the objects use is over), whereas in C#, destructors are non-deterministic (they are called by the Garbage Collector, and need not be immediately after the use of the object is over). A C++ programmer whos new to C# can introduce bugs if she assumes the deterministic behavior of destructors while coding in C#. Another example is the C++ virtual keyword. In C++, using the virtual keyword in the base method makes the method a virtual method, and the overriding method need not use the virtual keyword; however, using the virtual keyword in the overriding method is recommended as a best practice, since it improves the readability of the code. However, if a seasoned C++ programmer starts programming in C#, and follows this C++ best practice in C#, shell introduce a subtle bug: in C#, if you use the virtual keyword in the overriding method, it becomes hiding, and not overriding! Okay, lets look at some actual code segments illustrating such language transition bugs specifically, well focus on Java and C# differences. Here is our first code segment: int foo() { return true ? null : 0; } Well, your first question will bedoes it really compile? The answer is: it depends! In C#, youll get a compiler error for attempting to convert a null to an int. But youll be really surprised that in Java, it compiles, but fails with a NullPointerException during execution. The code compiles in Java because of some arcane language rules on boxing and un-boxing. Now, how about this code? int i = 10; int j = 10; return ((Object)i == (Object)j) If youre a Java programmer, you would say the code returns true; if youre a C# programmer, you would say false! In C#, boxing a primitive type to a reference type causes the creation of two different reference-type objects, and hence the condition check will return false. In Java, boxing a primitive type to a reference type reuses the existing boxed object, and hence, its true! Here is our last example:
69

byte b1 = 127; byte b2 = 1; byte b3 = (byte)(b1 + b2); Now, what is the value of b3? Its 128 for a C# programmer and -127 for a Java programmer!In C#, a byte is unsigned, so the range is from 0 to 255; in Java, a byte is signed, so the range is from -127 to +128!

70

Penny wise and pound foolish!


We often try to use a smaller data type for saving space. Though it looks like clever programming, it can cause nasty bugs. Well see an example in this column.

Figure 1: A bug in my insurance renewal notice

A few years back, I received my car insurance renewal notice. I was surprised to see a software bug in that notice! Check the figure (personal details and company name are hidden). Can you explain what could have gone wrong?

As highlighted in the image with a light red box, the Customer ID entry reads 1.00E+ 11, which is absurd! How could this have happened? To answer that, well first discuss the seemingly unrelated topic of using printf format specifiers in C.

In C format specifiers, when we use %f (fixed precision format specifier), it will print the number in decimal format (for example, 123.45). However, if the floating point number is big, it can end up printing a lengthy sequence of digits (for example, 12345678912345.67), which is difficult to read. When we use %e (scientific notation), it prints the floating value in the exponent format (for example, the value 123.45 will be printed as 1.234500e+02). End-users are not familiar with this scientific output format, so using it by default is not preferable. So, what is the solution?
71

Fortunately, there is another format specifier, %g, which mixes both these approaches: It uses decimal format for small numbers, and exponent format for large numbers. For example, if the floating point value is 123.456, %g will print it as 123.456, but if the floating value is 1234567.8, then it is printed as 1.23457e+06. If we use %G, we get the symbol e printed in upper-case: 1.23457E+06.

As you can see, %g (or %G) is a convenient format specifier to use, and that is why it is preferred in most software applications. In most other languages (such as Java), this approach (the exact details might vary slightly) is used by default.

Coming back to the insurance notice, can you now explain how 1.00E+ 11 might have gotten printed? I dont know what programming language was used for developing that insurance software, nor do I have access to its source code. However, with this background about format specifiers, we can make an educated guess. The Customer ID, in this case, is a number of a few digits. The programmer who wrote the code for the automation of the insurance workflow might have been stingy in using memory space. So, instead of using a string representation for customer ID, which takes a number of characters for each ID, he (or she!) could have chosen a floating-point representation for storing the number, which takes only 4 or 8 bytes, depending on whether it is a float or double data type. During testing, the tester might have used smaller Customer IDs, so the numbers would have been displayed correctly. However, in real-world use of the software, when the customer ID probably became a large number, the floating point number is printed in exponential forma bug!

As you can see, trying to save a few bytes of memory per customer ID manifested itself as a bug in the software. The lesson is, be careful in trying to optimise storage space.

72

Lets Go : A First Look At Googles Go Programming Language


Go is a new systems programming language launched by Google, and has received wide attention in the programming community. This article gives you an overview of this language, with some specific examples to help understand its features.

Go was announced just a few months back, but is already being touted as the next C language. Why? The C language itself evolved from a language known as B. C was created in the 1970s, and still continues to be widely used; however, the language has mostly stopped evolving, and there is a dire need for a new language that could replace C. There are many languages that have been named D (the most popular being the one by Walter Bright), or those that want to be the D language. Still, nothing has made the cut, so far. Go might well become the next C languageor it may be Gone in a few years!
73

Is there substance behind the hype and buzz around Go? Yes, a lot! Most systems programmers (like me) find it very good after trying it out and writing non-trivial programs. Why? Go has a familiar C-like syntax (but not exactly C syntax, as well see later) Yet, Go has the capabilities of modern dynamic languages like Python. Go is statically typed and almost as efficient as C (the performance of Go programs is within 10-20 per cent of the equivalent C code)! Though the language is designed for systems programming, it has capabilities like garbage collection and reflection, which make it a powerful language. Go is not an object-oriented language, but it has (arguably) novel features like interfaces, as well see later in this article. Go has already won Tiobes Language of the Year Award 2009. Tiobe is one of the most widely-referred-to programming language popularity indexes. There is a lot to cover on Go, so Ill be limiting myself to the most important aspects. First Ill cover the essential background information, and then Ill present some sample programs to introduce language features. What is Go? Go is a new, experimental, concurrent, garbagecollected, systems programming language. New and experimental: It is a new and evolving language that is still at the experimental stage. No production systems have yet been developed using Go. Concurrent: It is a concurrent language that supports communication channels based on Hoares Communicating Sequential Processes (CSP). The concurrency support is different from lock-based programming approaches like pthreads, Java locks, etc. Garbage-collected: Like most modern languages, Go is garbage-collected. However, work is under way to implement low-latency GC in Go. Systems-programming language: Like C, Go is a systems programming language that one can use to write things like compilers, Web servers, etc. However, we can also use it as a generalpurpose programming language, for applications that create XML files, process data, etc. Robert Griesemer, Ken Thompson (of Unix fame), and Rob Pike are the creators of the language. Goals and motivation Why was Go created? What are the problems that it tries to solve?

74

According to the creators of Go, no major systems programming language has come up in the last decade, though much has changed in the systems programming arena over the same period of time, or from even earlier. For example, libraries are becoming bigger with lots of dependencies; the Internet and networking are becoming pervasive, client-server systems and massive clusters are used today, and multicore processors are becoming mainstream. In other words, the computing world around us is undergoing considerable change. Old systems programming languages like C and FORTRAN were not designed with these in mind, which raises the need for a new language. Apart from these, the creators of Go felt that constructing software has become very slow. Complex and large programs have a huge number of dependencies; this makes compilation and linking painfully slow. The aim is for Go to be not just a language where we can write efficient programs, but also programs that will build quickly. Besides, object-oriented programming using inheritance hierarchies is not effective in solving problemsso the creators of Go want to have a better approach to write extensible programs. Important characteristics Look at some of the important features and characteristics of Go: Simplicity: Mainstream languages like C++, Java and C# are huge and bulky. In contrast, simplicity is a feature in Gos clean and concise syntax. For example, C is infamous for its complex declaration syntax. Go uses implicit type inference, with which we can avoid explicitly declaring variables. When we want to declare them, the declaration syntax is simple and convenient, and is different from C. Duck typing: Go supports a form of duck typing, as in many dynamic languages. A struct can automatically implement an interfacethis is a powerful and novel feature of Go. Goroutines: They are not the same as threads, coroutines, or processes. Communicating between goroutines is done using a feature known as channels (based on CSP), which is much safer and easier to use than the mainstream lock-based approach of pthreads or Java. Modern features: Go is a systems programming language, but it supports modern features like reflection, garbage collection, etc. Hello world example Here is the hello world example in Go:

package main import "fmt"


75

func main() { fmt.Printf("Hello world!") }


All programs in Go should be in a package; it is main here. We import the fmt package to use its Printf function. Execution starts with the main.main() function, so we need to implement it in the program. Functions are declared using the func keyword. Note the lack of semicolons at the end of the lines in this program. Looping examples Here is a simple program that calculates the factorial of the number 5:

func main() { fact := 1 for i := 1; i <= 5; i++ { fact *= i } fmt.Printf("Factorial of 5 is %d", fact) }
Note that we havent declared the variables i and fact.Since we have used the := operator, the compiler infers the type of these variables as int based on the initialisation value (which is 1 for i). The for loop is the only looping construct supported in Go! If you want a while loop, it is a variation of the for loop in which the loop has only a condition check. For example:

fact := 1 i := 1 for i <= 5 { fact *= i; i++ }


Did you notice that we used a semi-colon in this code snippet? Wherever the compiler can infer semi-colons, we dont have to use thembut sometimes we do. It will take a while to get used to the rules about semicolons in Go. Looks like C, but its not C! Here is a function for returning the square of a number:

func square(f float) float { return f * f }


This function definition looks very similar to C, but there are differences. For example, the argument is declared as f float and not float f as we would write in C. In Go, variable names are
76

used first, followed by the type declaration. Similarly, the functions return type is given after the parameter list; in C, we specify it before the function name. It takes time to get used to these differences, but this approach has resulted in considerably simpler declaration syntax than in C. In Go, we can return more than one return value. This code segment gets the integer return value from Atoi (a string to the integer conversion function in the strconv package) function; it also gets the error value, which will be useful to check in case the function fails:

str := "10"; i, err := strconv.Atoi(str); print(i)


Multiple return values is a useful feature in Go,particularly for handling error conditions. This code also shows the built-in print function to print to the consoleGo has a few built-in functions like this. In Go, many features are similar to each other. We just covered multiple return valuescan you now guess (given that i and j are integers) what this statement does?

j, i = i, j;
Yes, thats right, it swaps the values in the variables i and j ! Simple, concise and neat, right? A function example Here is a more difficult example using functions:

func transform(arr []int, foos []func (int) int ) { for _, foo := range foos { for i :=0; i < len(arr); i++ { arr[i] = foo(arr[i]) } } return } func main() { arr := []int{1, 2, 3, 4, 5}; foos := []func (int) int { func(arg int) int { return arg * arg }, func(arg int) int { return -arg } }; transform(arr, foos); fmt.Println(arr); }
77

Well start from the transform function. It takes two arguments: an array of integers, and an array of functions (in which each function accepts a single int argument, and returns an int). In the for loop, we use the range keyword to get the indexes and values in the array of functions. Since we want only the value here, we ignore the index by assigning it to _. Now, each value assigned to the foo variable is a function, and we call that function for each element in the integer array, storing the functions return value back in the same element of the integer array. In the main function, we declare a slice (that internally points to an array), and a slice of functions. We create two functions without names (function literals), and put them in the slice. We pass these two variables to the transform function. The first function literal squares the integer value; the second function literal negates the value. So, this program prints: [-1 -4 -9 -16 -25]. Introducing structs and methods Here is the declaration of a simple struct named Point, with two integer members, x and y:

type Point struct { x, y int; };


We can declare methods outside the struct. Here is the Print method, which prints the values in a Point variable:

func (p Point) Print() { fmt.Printf("(x = %d, y = %d)", p.x, p.y); }


Method definition syntax is different from that of functions a method definition takes a struct type before its name, which is an implicit parameter during the call to that method. Now, this code in the main function creates a Point variable, and prints out its value:

p := Point{10, 20}; fmt.Printf("%v", p); p.Print() It prints: {10, 20} (x = 10, y = 20)
Here, Point{10, 20}; is a struct literal, and looks like a constructor call. We use the %v descriptor of the Printf family of functions to print the struct variable to console, followed by our custom Print methods output.
78

Introducing interfaces Well build upon our Point struct example by declaring a Printer interface, which declares a Print method:

type Printer interface { Print() } type Point struct { x, y int } func (p Point) Print() { fmt.Printf("(x = %d, y = %d)", p.x, p.y); } func main() { var p Printer = Point{10, 20} p.Print() }
Interfaces are specified with the interface keyword. Note that these interfaces are not exactly the same as in C#/Java. The main difference is that a struct doesnt have to say it implements an interfaceany struct that implements the methods specified by an interface satisfies that interface! Those familiar with dynamic languages will easily understand that it is similar to duck typing in those languages. But remember Go is a strictly type-checking statically typed language, and it implements this feature! A goroutines example Goroutines are functions executing in parallel in the same address space in stack. They are not the same as threads or processes, so they get a new name, goroutines.Goroutines communicate using channels (based on CSP). Here is an example as given in the Effective Go document (see the Resources section at the end of this article), which shows how a Sort on a big list can be done in parallel with some other computation.

c := make(chan int); // Allocate a channel. // Start the sort in a goroutine; when it completes, signal on the channel. go func() { list.Sort(); c <- 1; // Send a signal; value does not matter. }(); doSomethingForAWhile(); <-c; // Wait for sort to finish; discard sent value.
A channel is for communication between goroutines; this code creates a channel using the make built-in function (which is similar to new, and not covered in this article). A goroutine is
79

invoked using the go keyword. Here, the goroutine is created to call a function defined as a function literal, which does two things. First, it does a Sort on a list (assume that this is a timeconsuming operation). Next, once the sorting is done, it sends a signal to the caller goroutine, that it is done with its work. Meanwhile, the caller code (which is a goroutine) does some other work in the call doSomethingForAWhile, and waits for the other goroutine to finish in the statement <-c;. This code segment effectively shows how goroutines are created and how communication is done between them. Implementation status Currently, Go implementations are available for two platforms: Linux and Mac OS X. There are two implementations for these platforms: one is a stand-alone implementation of a Go compiler and runtime, written by Ken Thompson. Another is a GCC front-end, implemented by Ian Lance Taylor. What about Windows? Some unofficial ports exist, but they are not very stable or up-to-date. I installed Go in Fedora, and the installation was smooth and easy. All of Gos toolset (compilers, packages, runtime libraries, tools, etc) is available as open source, under a BSD licence. Wrapping up With my experience in trying Go over the last few months and writing non-trivial programs, I think there is a lot of substance behind the hype. Go certainly has some unique features that will make it a useful systems programming language, in this age of the Internet and multi-core processors. In just a couple of days since I started learning Go, I was able to write non-trivial programs, and began liking the language. It was fun learning and playing around with it. I would recommend Go to any systems programmer and encourage them to try it out. Resources Go websites: The official website is www.golang.org (the Web server is implemented in Go!). Lots of material on Go is available from this website, including tutorials and presentations. You can download Go compilers and tools from this website. An unofficial Go website is at http://go-lang.cat-v.org/ it has links to download the Windows version of Go, and has links to numerous Go-related resources. Learning Go: As of this writing, there is no book available on Go. However, the Go website has very good documents for learning Go. Check Effective Go at http://golang.org/doc/ effective_ go.html. Robert Pikes talk introducing Go is available at http://www.youtube.com/watch?v=rKnDgT73v8s. Gos discussion group is at http:// groups.google.com/group/golang-nuts/.

80

Try Go online: I find it really handy to compile and run Go programs online (!) at the IDEOne website http://ideone.com/

81

Typo Bugs
Can typing mistakes (typos) cause bugs? Yes, they can! Well look at some common C programming mistakes in this column.

Spelling correction example - http://en.wikipedia.org/wiki/ File:Spelling_Correction_Example.jpg

What does the following program print?

Here, you would expect in default to get printed; but it does not print anything and the default case does not execute. Why? The keyword default was mistyped as defalut. Why does the compiler still compile the code without complaining? In C (as in many other languages), a label (as a target of the goto statement) is a name followed by a : (colon). In this case, defalut: was treated as a label and hence the bug!

In the May 2009 JoP column, we discussed a typographical (typing) mistake that resulted in the loss of the Mariner-I rocket because of mistyping a . (full stop) instead of a , (comma). So, well go further and discuss some common typing mistakes in C that we as programmers make while writing code. Typing l (lowercase L) instead of 1 as in the statement:

In this case, the constant is 9 (nine) and not 91 (ninety-one) since l is a suffix to indicate a long variable. Typing = instead of == , which makes a comparison expression into an assignment statement, as in:

This mistake is so very common in C that a defensive programming technique is to reverse the condition, as in, if(0 == a) , so that compiler will catch it if we make a mistake. Typing == instead of = , which makes an assignment statement into a comparison statement, as in:

In fact, this example is from Peter van der Lindens book Deep C Secrets in which he talks about a bug that was holding up a $20 million sales deal at Sun. It turned out to be this mistake in the code in the I/O library. Typing a ; (semi-colon) after a for loop by mistake, as in:

And this code prints hello once and not ten times. Forgetting to type a comma between two strings, which leads to stringization of two strings, as in:

This prints raindeer instead of rain and deer. Can you find out whats wrong with this code?
struct Point { int x, y; } foo() {}

Here, we forgot to put a semicolon after the struct definition point. In old C, if we dont provide a return type, it is considered to return int by default. However, the definition for function foo is that it returns the struct Point! (Yes, we can define a struct/class on-the-fly as a return type in C/C++!). This typo bug has caused many sleepless debugging nights for programmers worldwide. As of this writing, the latest news is about Tiny typo blamed for massive IE security failure. The following code and the explanation is from a blog entry:

Liskovs Substitution Principle


LSP is a cardinal rule to follow in object-oriented designs. In this column, well introduce LSP to those new to OOP (Object Oriented Programming), and discuss a couple of examples from JDK that violate this principle.

Featured image courtesy: mseery. Reused under the terms of CC-BY-NC 2.0 License.

The whole of object-oriented design boils down to a few principles such as abstraction, encapsulation, modularization, hierarchy, and regularity. Abstraction allows us to model in terms of commonality between objects, and express the design in terms of the problem domain. Encapsulation allows bundling data and the functions operating on it as a single unit; further encapsulation enables us to hide implementation details, and expose only the interface to the users. Modularization allows separation of concerns and definition of crisp boundaries between abstractions. With hierarchy, we can organize or arrange abstractions at multiple levels. With regularity, we can create uniform solutions that are easier to grasp and understand. It takes years of

experience to understand that these are the essence of OOP (and not the favorite features that your programming language provides), and following these generic principles gives us incredible power in problem solving. Let us take the hierarchy principle, for example. One way to realize hierarchy is to create a relationship between abstractions, using a language feature named inheritance. For example, in an image-processing application, instead of writing code in terms of different kinds of image files such as GIF, JPEG, EPS, SVG, PNS, etc., one can exploit the commonalities between the image file formats. For example, we can classify image types at a higher level as raster (storing images as bitmaps, or in terms of pixels), or vector (geometric description of images) formats. This allows for abstracting commonalities of the specific file format, and moving the specific details of a file format to that class. In other words, we can have a base class named ImageFile, and two derived classes, namely, RasterFile and VectorFile. Further, RasterFormat can have derived classes such as JPEGFile, TIFFFile, PNGFile, etc., and VectorFile can have derived classes such as CGMFile, SVGFile, etc. With this design, when we want to write high-level code for example, reading the file from the disk we can write code in terms of the generic type ImageFile. Now, if there is any need to refer to specific types for example, when converting from one image type to another we can use specific file types such as JPEGFile, SVGFile, etc. With this design approach, it is possible to reuse the code the general code applicable to all

ImageFiles can be moved to that class. More specific code relating to RasterFile
and VectorFiles can move to those classes, and concrete details on image formats such as JPEG or SVG can go into corresponding classes such as JPEGFile, SVGFile, etc. To summarize: with inheritance, we are exploiting commonalities between implementations by abstracting the interface. Now, code can be written in terms of the common base interface. When specific derived implementations need to be assigned to the base interface references, the user code need not change. Hence, this approach leads to reusability and flexibility in design. However, this fundamental benefit of inheritance is broken if the derived types cannot be assigned to the base types. Liskovs Substitution Principle (LSP) describes such a situation, and hence is a cardinal rule to follow in OOP.

The informal definition of LSP is this: Derived classes must be usable through the base class interface without the need for the user to know the difference. I know this description is difficult to understand, and Ill explain LSP using two examples from the Java library, to illustrate what it means to violate this principle. Every computer science student who has taken a data structure course knows that a stack is not a vector. A vector is just like an array, only that it can grow in size. So we can insert or delete elements from anywhere in the vector. However, stack is a LIFO (Last In First Out) data structure: we can insert and remove only from one end of the data structure. Hence, a stack is not a vector; maybe a stack can be implemented using a vector. In JDK, these two container classes share an inheritance relationship: Stack extends Vector. For this reason, we can add or remove elements anywhere from the Stack! Here is a code example that illustrates this problem: Vector<String> vectorStack = new Stack<String>(); vectorStack.addElement("one"); vectorStack.addElement("two"); vectorStack.addElement("three"); vectorStack.removeElementAt(1); System.out.println(vectorStack.size()); // prints: 2 As you can see, in this program, we can remove the element from the middle of the Stack (with the call removeElementAt(1)), and treat Stack as if it were a Vector! So, how do we treat a Stack as a stack when we have a Vector reference? One way is for the user to check the dynamic type (i.e., what class type the object reference points to at runtime) of the object and if that is a Stack, do a downcast to Stack and apply operations such as push and pop. This is too much of a workaround because of a design mistake in the Java library. In other words, Stack could have been declared as an interface, and different Stack implementations could have been derived from it. Or else, Stack could have been implemented using Vector as the data container, i.e., using containment instead of inheritance.

Figure 1: Two LSP violation examples from JDK

Another example of LSP violation is in the case of the Property class, which extends

Hashtable. A Hashtable can take any non-null values as key and values. A Property object can take only a String as key or value. The Property class has
methods like getProperty and setProperty to get and set property values. However, we still have access to methods such as put and putAll from Hashtable, which we can use to put keys/values of any type. When we attempt to put non-String keys or values in a Property, it appears to work fine: Hashtable extnNos = new Properties(); extnNos.put("Kathy", "3542"); extnNos.put("Joel", "4433"); // mistake in the following statement: // 3224 typed instead of "3224" extnNos.put("Joshua", 3224); System.out.println(extnNos); This code segment prints: {Kathy=3542, Joshua=3224, Joel=4433} However, Java documentation calls such Property objects compromised. Method calls such as store, save or list fail when called on compromised Property objects. Here is a slightly modified version of the same code using Properties.list method instead of System.out.println:

Hashtable extnNos = new Properties(); extnNos.put("Kathy", "3542"); extnNos.put("Joel", "4433"); // mistake in the following statement: // 3224 typed instead of "3224" extnNos.put("Joshua", 3224); ((Properties)extnNos).list(System.out); This code results in a crash: -- listing properties -Kathy=3542 Exception in thread "main" java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String at java.util.Properties.list(Unknown Source) at UseProperties.main(UseProperties.java:9) This design Property extending Hashtable violates LSP. In this case, a

Property would have been better implemented using a Hashtable, i.e., without
sharing the inheritance relationship with each other. Because of this design mistake, users of the Property class need to take more care in using the class correctly. Since JDK is a library a published API it is not possible to correct these design mistakes. As users, we cannot do anything about these design mistakes in JDK, but we can learn from them; following established design principles or rules can help create better designs, and not following them can be costly.

Why is a Software Glitch Called a Bug?


In this column, weve always covered unusual and interesting technical topics. This month, we discuss the word bug, and the history behind its use in the software context.

Moth that caused the malfunction in the Mark II computer

There are three things common to every software engineer living in this world: managers, deadlines and bugs! Everyone can understand the first two, but software bugs being a part of ones life is certainly unique to programmers. All programmers grapple with bugs they get the software working by avoiding bugs; they debug and fix bugs; they track bugs But why is a software glitch called a bug? It certainly is not an insect.

Some people say the word bug comes from the old English (Welsh) word bwg, which meant a problem or a difficulty; later, it was used to describe defects with machines and then with computers. The word bug seems to have caught on with computers because of an incident related to an insect in an early computer. Mark II was an early electromechanical computer used in the US Navy. On September 9, 1947, when the operators were using the computer to perform calculations, it gave the wrong results. To find out what was going wrong, they opened the computer and looked inside (remember, this was in the good old days, and an electro-mechanical computer was in use). And there they found a moth stuck inside the computer, which had caused the malfunction! The operators promptly removed it and pinned it on the log report, and wrote the following description, First actual case of bug being found (see image). They also coined the word debug, which meant taking the bug out to get the computer working. Perhaps this might have prompted the well-known computer scientist Edsger W Dijkstra to say (in a lighter vein), If debugging is the process of removing bugs, then programming must be the process of putting them in. On the same lines, the word patch, which means applying a fix for a bug in a program, comes from the old days, when programmers used to fix a program stored on paper tape by using glue and paper! There are many terms used in software engineering to describe a problem in the software for example, defect, error, malfunction, anomaly, fault, failure, etc. There are shades of differences in the meanings of these terms. Various standards and organisations define or use these terms in different ways, often causing confusion. In practice, the most widely used and colloquial term for a software defect is bug. Dijkstra called for cleaning up our language by no longer calling a bug a bug, but by calling it an error, because careless or casual use of words such as bugs when referring to computer defects takes the seriousness out of defects. However, the word bug seems to have caught on, and perhaps it is too late to try changing the terminology.

A Bug or a Feature?
A puzzling aspect of bugs is that they often turn out to be features (and vice versa)! Lets explore this interesting topic with an example.

Figure 1: A 'bug vs feature' example from MS PowerPoint

In my experience working with enterprise software, I come across numerous reports by users about bugs in the software. On detailed analysis, such bug reports often turn out to actually be features of the software, or misunderstood functionality of the software. It is difficult to give a generic example of this problem, since it is very specific to the context of the application. However, I recently came across an example that anyone can relate to. I was trying to rotate an image in MS PowerPoint. We can right-click on an image and type the rotation degree in the text-box that pops up (see Figure 1).

By mistake, when I typed a degree outside the range -360 to +360, a help message popped up. I thought it was asking me to Enter a value from -360 to 360, but on looking closer, I realised it read, Enter a value from -3600 to 3600! I thought it was a bug in both the help message and the software, because the text-box accepted the range of values from -3600 to +3600. However, when I tried typing outside the range -3600 or +3600, the software rounded the degree to -3600 or +3600 as appropriate, and used the modulus of 360 to determine the rotation angle. So, it was clear that this was not a bug, but an intended feature in the software! So, in general, should this be considered a bug or a feature? We all know that you can rotate an image from -360 to +360; for angle values outside this range, the rotation angle is the same as the angle value mod 360. So, when we try rotating the image in software, we expect two possibilities: either the software limits the range from -360 to +360, or it accepts any range of values, and applies mod 360 to determine the rotation angle. However, it is unintuitive and unexpected that the allowed range is from -3600 or +3600, in which the acceptable angles are 10 times the range. Specifically, the range of acceptable angle values is 10 times (an arbitrary limitation assumed by the software) the expected range, which is incorrect. Hence, it is a bug in the software. However, if you post this bug to the developers of the software, they will reject this as a bug request. Why? From the software vendors perspective, this range value was thought about, and extra range (10 times the expected range) is allowed as a feature, and hence its not a bug! In my work experience, I remember many heated debates with customers (or even within development teams) about whether the given problem was a bug or a feature. The problem becomes complicated, and results in a strange situation where the bug is used as a feature! Assume that the software you have shipped has a bug. The customers dont know that it is a bug, but think it is a feature, and start using it. Now, even if you realise that it is a bug, you cannot fix it. Why? Since customers think it is a feature, if that feature is not available anymore in the new version, they will start complaining in a big way. Hence, you would need to support that bug in future releases too. This requirement is known as bug compatibility: you not only need to maintain compatibility with the old features supported by the software, but also maintain compatibility with old bugs in the newer version of the software! This bug compatibility problem is very pronounced in APIs (Application Programming Interfaces). For example, once a library or a framework is publicly available, you cannot fix the

bugs that were introduced in the earlier releases. Doing so would break the compatibility with the existing users who are still using the old version of the library. This is one of the reasons why API development is a challenging task; we need to get it right the first time, or else well never be able to fix it to get it right.

Types of Bugs
In this column, well look at four types of bugs, named after popular scientists. The classification is interesting well understand how strange bugs can be!

Feature image courtesy: Olivia. Reused under the terms of CC-BY-NC-SA 2.0 License.

Jim Gray, in his popular paper [Why do computers stop and what can be done about it?, Symposium on Reliability in Distributed Software and Database Systems, 1986] originally proposed the classification of bugs as Bohrbugs and Heisenbugs, named after well-known scientists. Today, there are more bug types known to us, so well also look at two other categories of them. Bohrbugs: Most of the bugs that we come across are reproducible, and are known as Bohrbugs. These are named after Niels Bohr, who proposed a simple and easy-to-understand atomic model in 1913. In Bohrs model, things like the path and momentum of an electron in an atom are predictable. Similarly, Bohrbugs are predictable you can reproduce them if you

run the software with similar conditions. For example, when the program crashes with a nullpointer access, it always crashes there for a given input; so you can easily reproduce it. Heisenbugs: All experienced programmers have faced situations where the bug that crashed the software just disappears when the software is restarted. No matter how much time and effort is spent trying to reproduce the problem, the bug eludes us. Such bugs were named Heisenbugs, after Werner Heisenberg, who is known for his uncertainty principle. According to his theory, it is not possible to accurately or certainly determine the position and velocity of an electron in an atom at a particular moment. When bugs change their behavior when you try to debug, probe or isolate, they are called Heisenbugs. It can happen, for example, when you use uninitialized variables. When the program is run, it will access variables that are uninitialized, and hence result in a bug. However, when you try to debug the program, the program might work just fine, because many debuggers initialize uninitialized variables to zeros, and so you might not hit the problem! Mandelbugs: When the cause of the bug is too complex to understand, and the resulting bug appears chaotic, it is called a Mandelbug. These are named after Benot Mandelbrot, who is considered the father of fractal geometry (fractals are complex, self-similar structures). A bug in an operating system that depends on scheduling is an example of a Mandelbug. Schroedinbug: Sometimes, you look into the code, and find that it has a bug or a problem that should never have allowed it to work in the first place. When you try out the code, the bug promptly shows up, and the software fails! Though it sounds very uncommon, such bugs do occur and are known as Schroedinbugs. They are named after the scientist Erwin Schrdinger, who proposed a theoretical cat experiment. In quantum physics, quantum particles like atoms could exist in two or more quantum states, but Schrdinger suggested that in more classical objects like a cat which is made up of many atoms, existing in two states was impossible. He theorised about a scenario in which a cat is kept in a sealed chamber, with a vial of poison (attached to a radioactive atom). If the atom decayed, the vial would be smashed and the poison would leak, killing the cat. But with the chamber sealed, there would be no way to know whether the cat was dead or alive. So till the chamber is opened, theoretically, the cat could be in either of two states dead or alive. In quantum physics, this is called a superposition state, where the cat is both alive and dead! Coming back to bugs, by merely observing the problem in the code, you change the outcome either the software works or breaks. So these kinds of bugs are known as Schroedinbugs. There are other types of bugs that dont come under these categories. For instance, agingrelated bugs occur only after the software runs for a long time.

Vous aimerez peut-être aussi