Académique Documents
Professionnel Documents
Culture Documents
1 DATA REPRESENTATIONS
Number systems.
There are many ways to represent integers: the number of days in the month of
October can be represented as 31 in decimal, 11111 in binary, 1F in hexadecimal,
or XXXI in Roman Numerals. It is important to remember than an integer is an
integer, no matter whether it is represented in decimal or with Roman Numerals.
Decimal numbers. We are most familiar with
performing arithmetic with the decimal (base
10) number system. This number system has
been widely adopted, in large part because we
have 10 fingers. However, other number
systems still persist in modern society.
Sexagecimal numbers. The Sumerians uses a
sexagecimal (base 60) number system. We
speculate that 60 was chosen since it is
divisible by many integers: 1, 2, 3, 4, 5, 6, 10,
12, 15, 20, and 30. Most clocks are based on
the sexagecimal system. The Babylonians
inherited sexagecimal numbers from the
Sumerians. They divided a circle into 360
degrees since they believed the Sun rotated
around the Earth in about 360 days. Ptolemy
tabulated trigonometric tables using base 360,
and, even today, we still often use degrees
instead of radians when doing geometry.
Binary numbers. Computers are based on the
binary (base 2) number system because each
wire can be in one of two states (on or off).
Hexadecimal numbers. Writing numbers in
binary is tedious since this representation uses
between 3 and 4 times as many digits as the
decimal representation. The hexadecimal
(base 16) number system is often used as a
shorthand for binary. Base 16 is useful
because 16 is a power of 2, and numbers have
roughly as many digits as in the corresponding
decimal representation.
The xi terms are the positional digits, and each digit is required to be an integer
between 0 and b - 1. In binary, the two digits (also referred to as bits) are 0 and 1;
in decimal, the ten digits are 0 through 9; in hexadecimal, the sixteen digits are 0
through 9 and the letters A through F. Every nonnegative integer can be expressed
using positional notation, and the representation is unique (up to an arbitrary
number of leading 0s). As an example the number of days in a leap year is 36610 =
1011011102 = 16E16 since:
366 = 3102 + 6101 + 6 100
366 = 1 28 + 0 27 + 1 26 + 125 + 024 + 123 + 122 + 121 + 020
366 = 1162 + 6161 + E160
The following table gives the binary, decimal, and hexadecimal representations of
the first 48 integers.
0 0 0 10000 16 10 100000 32 20
1 1 1 10001 17 11 100001 33 21
10 2 2 10010 18 12 100010 34 22
11 3 3 10011 19 13 100011 35 23
Number conversion.
You need to know how to convert from a number represented in one system to
another.
Converting from base b to decimal. To convert
from an integer represented in base b to decimal,
multiply the ith digit by the ith power of b, and sum
up the results. For example, the binary number
101101110 is 366 in decimal.
1 0 1 1 0 1 1 1 0 (binary)
256 128 64 32 16 8 4 2 1 (powers of 2)
-------------------------------------------
256 + 64 + 32 + 8 + 4 + 2 (multiplied by
corresponding digit)
366
One way to perform arithmetic is to convert all of the numbers to base 10, perform
arithmetic as usual, and then convert back. In many cases, it is easier to perform
the arithmetic directly in the given number system.
Addition. In grade school you learned how to
add two decimal integers: add the two least
significant digits (rightmost digits); if the sum
is more than 10, then carry a 1 and write
down the sum modulo 10. Repeat with the
next digit, but this time include the carry bit in
the addition. The same procedure generalizes
to base b by replacing the 10 with the base b.
For example, if you are working in base 16
and the two summand digits are 7 and E, then
you should carry a 1 and write down a 5
because 7 + E = 1516. Below, we compute
456710 + 36610 = 493310 in binary (left), decimal
(middle) and hexadecimal (right).
1 0 0 0 1 1 1 0 1 0 1 1 1 4 5 6 7 1 1 D 7
+ 0 0 0 0 1 0 1 1 0 1 1 1 0 + 3 6 6 + 1 6 E
------------------------- --------- ---------
1 0 0 1 1 0 1 0 0 0 1 0 1 4 9 3 3 1 3 4 5
Multiplication. One compelling reason to use
positional number systems is to facilitate
multiplication. Multiplying two Roman
Numerals is awkward and slow. In contrast,
the grade school algorithm for multiplying two
decimal integers is straightforward and
reasonably efficient. As with addition, it easily
generalizes to handle base b integers. All
intermediate single-digit multiplications and
additions are done in base b. Below, we multiply
the decimal integers 4,567 and 366 (left) and then
again in hex (right).
4 5 6 7 1 1 D 7
* 3 6 6 * 1 6 E
--------- --------------
2 7 4 0 2 F 9 C 2
2 7 4 0 2 6 B 0 A
1 3 7 0 1 1 1 D 7
------------- --------------
1 6 7 1 5 2 2 1 9 8 1 6 2
Negative integers.
On a machine with 16-bit words, there are 216 = 65,536 possible integers that can
be stored in one word of memory. By interpreting the 16 bits as a binary number,
we obtain an unsigned integer in the range 0 through 65,535. Instead, we can
interpret the leading bit as the sign of the number, using two's complement
notation. This allows us to interpret the 16 bits as a signed integer in the range
-32,768 through +32,767, as described in the table below. As with binary integers, it is often
convenient to express 16-bit two's complement integers using hexadecimal notation.
BINARY HEX DECIMAL
0000 0000 0000 0000 0000 0
...
...
Converting a hexadecimal two's complement
integer into decimal. To convert the 16-bit
two's complement integer FE92 into decimal,
we start by writing down its binary
representation: 1111 1110 1001 0010. We
recognize it as a negative integer since the
most significant bit is 1. Then, we negate it
(flip bits and add 1) to obtain: 0000 0001
0110 1110. We convert 101101110 (binary)
to 366 (decimal). After putting back in the
negative sign, we obtain the final answer of
-366 (decimal).
Converting a negative decimal integer into its
hexadecimal two's complement
representation. To convert from the decimal
integer -366 to its 16-bit two's complement
representation, we can apply the above steps
in reverse. First we convert 366 (decimal) into
101101110 (binary). Next, we negate it (flip
bits and add one) to obtain 1111 1110 1001
0010. It is important that we fill in all 16 bits.
If desired, we can convert this to hexadecimal:
FE92.
Adding two's complement integers. Adding
two's complement integers is straightforward:
add the numbers as if they were unsigned
integers, ignoring any overflow. Below we
compute 456710 + -36610 = 420110 in decimal
(left) and again in binary using 16-bit two's
complement integers (right). Note that the second
binary integer represents a negative integer using
two's complement notation.
4 5 6 7 0 0 0 1 0 0 0 1 1 1 0 1 0 1 1 1
+ -3 6 6 + 1 1 1 1 1 1 1 0 1 0 0 1 0 0 1 0
--------- ---------------------------------
4 2 0 1 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1
-3 2 7 6 6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
+ -3 6 6 + 1 1 1 1 1 1 1 0 1 0 0 1 0 0 1 0
------------- ---------------------------------
-3 3 1 3 2 0 1 1 1 1 1 1 0 1 0 0 1 0 1 0 0
unsigned right shift x >>> y Move the bits of x to the right y positions
Program BitWhacking.java reads in two integers a and b from the command line,
applies the bit-whacking operations, and prints the results.
Bitwise not.
Computers differ in the way in which they store multi-byte chunks of information,
e.g., the 16-bit short integer 0111000011110010 = 70F2. This consists of the two
bytes 70 and F2, where each byte encodes 8 bits. The two are two primary formats,
and they differ only in the order or "endianness" in which they store the bytes.
Big endian systems store the most significant
bytes first, e.g., they store the integer above
in its natural order 70F2. Java uses this
format, as does Apple Mac, IBM PowerPC G5,
Cray, and Sun Sparc.
Little endian systems store the least significant
bytes first, e.g., they store the integer above
in reverse byte-order F270. This format is
more natural when manually performing
arithmetic, e.g., since for addition you work
from least significant byte to most significant
byte. Intel 8086, Intel Pentium, Intel Xeon use
this format.
Computer scientists occasionally engage in religious wars about which is better.
Fortunately, Java hides the endianness from the end user, so if you create binary
data files in Java, you won't need to worry about endianness when sharing them
over the Internet with Java users on Mac, PC, and Solaris platforms. Unless you
need to read in a legacy binary file (e.g., written in C on a PC), you shouldn't have
to directly confront these details.
Historical note: big endian and little endian derive from Gulliver's Travels. How to
crack an egg? At little end or big end?
Q+A
Q. My program only needs integers between -32,768 and 32,767. Does using a 16-
bit short make my program faster? Does it save space?
Q. Can I apply bitwise & and bitwise | to boolean values? If so, is there any
difference between the corresponding logical operators && and ||?
A. Yes. The difference is that the logical operators are subject to short circuiting:
the expression (f(x) && g(x)) will not evaluate g(x) if f(x) is false. This also
explains why there is no ^^ operator for logical XOR: it is never possible to short-
circuit an XOR, so it would always be identical to bitwise ^.
A. You can't. Java only uses the five low-order bits of the second operand. This has
the effect of shifting the number of values mod 32. Another consequence of this is
that left shifting by a negative integer does not right shift the number. This
behavior coincides with the physical hardware on many microprocessors.
Q. I need an unsigned 32-bit integer, but Java only has signed 32-bit integers.
What should I do?
A. First, are you sure that you really need an unsigned type. Signed and unsigned
integers behave identically on the bitwise operators (except >>), addition,
subtraction, and multiplication. In many applications, these are sufficient, assuming
you replace >> with >>>. Comparison operators are easy to simulate by checking
the sign bit. Division and remainder are the trickiest: the easiest solution is to
convert to type long.
long MASK = (1L << 32) - 1; // 0x00000000FFFFFFFF;
int quotient = (int) ((a & MASK) / (b & MASK));
int remainder = (int) ((a & MASK) % (b & MASK));
Program UnsignedDivision.java uses this trick, and also does it directly using 32-bit
operations.
Q. I need an unsigned 8-bit integer, but Java only has signed 8-bit integers (bytes).
What should I do?
A. Same advice as previous question. One place where it's nice to have unsigned
integers is for a lookup table, indexed by the byte. With signed integers the index
can be negative. Also, if b is a byte, then b << 4 automatically casts b to an int.
This could be undesirable since b is signed. In many applications you need to
remove the signed extended bits via (b << 4) & 0xff.
A. In Java, byte is an 8-bit signed integer. Before the right shift, b is converted to
an integer. You may want ((b & 0xff) << i) instead.
Exercises
Answer: 1011100.
2. Convert the hexadecimal number BB23A to
octal.
Answer: 53.
6. How many bits are in the binary
representation of 2^2^2^2^17?
7. IPv4 is the protocol developed in the 1970s
that dictates how computers on the Internet
communicate. Each computer on the Internet
needs it own Internet address. IPv4 uses 32
bit addresses. How many computers can the
Internet handle? Is this enough for every
human being to have their own? Every cell
phone and toaster?
8. IPv6 is an Internet protocol in which each
computer has a 128 bit address. How many
computers would the Internet be able to
handle if this standard is adopted? Is this
enough? Answer: 2^128. That at least enough
for the short term - 5000 addresses per
square micrometer of the Earth's surface!
9. When you buy a hard drive, 1 GB means
1,000 MB (megabytes) and 1 MB means 1,000
KB (kilobytes) and 1 KB means 1,000 bytes.
But when you buy memory, 1 GB means 1,024
MB, 1 MB means 1,024 KB, and 1 KB means
1,024 bytes. What percentage difference is
there in the amount of storage in a 100 MB
hard drive vs. 100 MB memory? 1GB hard
drive vs. 1 GB memory? (1024/1000)2 = 4.9%
and (1024/1000)3 = 7.4%.
10. Why does the following code fragment fail?
short a = 4;
short b = 5;
short c = a + b;
Answer: Java automatically promotes the sum
to be of type int. To assign the result to a
short, you need to explicitly cast it back c =
(short) (a + b). Yes, this is rather quirky.
int a = -5 >> 3;
int b = -5 >>> 3;
System.out.println(a);
System.out.println(b);
13. List all values a of type int for which (a ==
(a >> 1)). Hint: there is more than one.
14. Suppose a is a variable of type int. Find two
values of a for which (a == -a) is true.
Answer: 0 and -2147483648.
15. What is the result of a = -1 * -2147483648?
Answer: 0.
16. What does the following code fragment print out?
c = 0;
while (b > 0) {
if (b & 1 == 1) c = c + a;
b = b >> 1;
a = a << 1;
}
Answer: a * b.
18. What does the following code do to the integers
stored in two different variables a and b?
a = a ^ b;
b = a ^ b;
a = a ^ b;
19. Repeat the previous question, but assume a
and b are the same variable.
20. What does the following code do to the integers
stored in two different variables a and b? Any
problems with overflow?
a = a + b;
b = a - b;
a = a - b;
21. What do each of the following statements do?
x = - ~x;
x = ~ -x;
Increment x, decrement x
22. Modify Binary.java so that it converts from
base 7 to decimal and vice versa.
23. What does the following do?
public static boolean parity(int a) {
a ^= a >>> 32;
a ^= a >>> 16;
a ^= a >>> 8;
a ^= a >>> 4;
a ^= a >>> 2;
a ^= a >>> 1;
return a & 1;
}
Answer: computes the parity of the number of
1 bits set in the binary representation of a
using divide-and-conquer.
int cnt = 0;
for (int i = 1; i != 0; i = 2 * i) {
cnt++;
}
Hint: it's not an infinite loop.
Creative Exercises
0 = 0 -1 = 11 (-2 + 1)
1 = 1 -2 = 10
2 = 110 (4 + -2) -3 = 1101 (-8 + 4 + 1)
3 = 111 -4 = 100
4 = 100 -5 = 1111
5 = 101 -6 = 1110
6 = 11010 (16 + -8 + -2) -7 = 1001
7 = 11011
7. RGBA color format. Some of Java's classes
(BufferedImage, PixelGrabber) use a special encoding
called RGBA to store the color of each pixel. The
format consists of four integers, representing the
red, green, and blue intensities from 0 (not present)
to 255 (fully used), and also the alpha transparency
value from 0 (transparent) to 255 (opaque). The four
8-bit integers are compacted into a single 32-bit
integer. Write a code fragment to extract the four
components from the RGBA integer, and to go the
other way.
// extract
int alpha = (rgba >> 24) & 0xff;
int red = (rgba >> 16) & 0xff;
int green = (rgba >> 8) & 0xff;
int blue = (rgba >> 0) & 0xff;
// write back
rgba = (alpha << 24) | (red << 16) | (green << 8) | (blue << 0);
System.out.println(rgba);
8. Min and max. One of the following computes
min(a, b), the other computes max(a, b) without
branching. Which is which? Explain how it works.
1 1 6 00110
2 010 7 00111
3 011 8 0001000
4 00100 9 0001001
5 00101 10 0001010
14. Bit reversal. Write a function that takes an integer
input, reverse its bits, and returns that integer. For
example if n = 8, and the input is 13 (00001101),
then its reversal is 176 (10110000).
0 1 2 3 4 5 6 12 13 14 15
0000 0001 0010 0011 0100 0101 0110 ... 1100 1101 1110 1111
0000 1000 0100 1100 0010 1010 0110 ... 0011 1011 0111 1111
0 8 4 9 2 10 6 3 11 7 15
16. Swap without temporary storage. What do
the following two code fragments do given integers a
and b?
a = a + b;
b = a - b;
a = a - b;
a = a ^ b;
b = a ^ b;
a = a ^ b;
Answer: each 3-line fragment swaps a and b.
It works provided a and b are not the same
variables (in which case both variables are
zeroed out).
XXYYYYZZ
XX = checksum of track offsets in seconds, taken mod
255
YYYY = length of the CD in seconds
ZZ = number of tracks on the CD
2. True or false. If a xor b = c, then c xor a = b
and c xor b = a.
3. Explain why the following code fragment does
not leave ABCD in variable a. How would you fix
it?
byte b0 = 0xAB;
byte b1 = 0xCD;
int c = (b0 << 8) | b1;
Answer. In Java, byte is a signed 8-bit
integer. The right-shift promotes b0 to a
(negative) integer. To fix the problem, use c
= ((b0 & 0xff) << 8) | (b1 & 0xff);.
Copyright 2007 Robert Sedgewick and Kevin Wayne. All rights reserved.