Académique Documents
Professionnel Documents
Culture Documents
Data Types
Unit-3 Topics
Introduction
Primitive data types
Character string types
User-defined ordinal types
Array types
Associative arrays
Record types
Union types
Pointer and reference types
Type Checking
Strong Typing
Type Equivalence
2
Introduction
DATA TYPE
• A data type defines a collection of data objects
and a set of predefined operations on those
objects.
TYPE SYSTEM
o Defines how a type is associated with each
expression in the language
4
Evolution of data types
FORTRAN I (1957)
Just types for INTEGER, REAL, arrays
Ada (1983)
Programmer able to create a user-defined type for every category of
variables in the problem space and have the system enforce the
types
5
Primitive Data Types
• Almost all programming languages provide a set
of primitive data types
15
COMPLEX
Some programming languages support a complex data
type—for example,Fortran, Python and C99.
18
Boolean Types
Boolean types are perhaps the simplest of all types.
Their range of values has only two elements: one for
true and one for false.
19
Boolean Types
This is not the case in the subsequent languages,
Java and C#.
Boolean types are often used to represent switches or
flags in programs.
The use of Boolean types is more readable.
A Boolean value could be represented by a single bit.
One of disadvantage is because a single bit of
memory cannot be accessed efficiently on many
machines.
20
Primitive Data Types: Character
Stored as numeric codings
Most commonly used coding: ASCII
An alternative, 16-bit coding: Unicode (UCS-2)
Includes characters from most natural languages
Originally used in Java
C# and JavaScript also support Unicode
32-bit Unicode (UCS-4)
Supported by Fortran, starting with 2003
Non-Primitive Data Types
Non- primitive data types of constructed using
primitive data types.
Ex:- arrays, sets, sub range, enumeration, pointers ,
strings, structures and unions.
Character Strings
A character string type is one in which the values
consist of sequences of characters.
Character string constants are used to label output,
and the input and output of all kinds of data are often
done in terms of strings. 22
Design Issues
The two most important design issues that are
specific to character string types are the
following:
23
Strings and Their Operations
The common string operations are assignment,
catenation substring reference, comparison and
pattern matching.
38
Enumeration type
An enumeration type is one in which the user
enumerates all of the possible values
Values are symbolic constants (identifiers)
Example (Ada)
39
Enumeration type
Design Issues
What operations are allowed for enumeration types
Ada has attribute operations
• Days‘First gives the first day
• Days‘Last gives the last day
• Days‘Pos( today ) gives the Integer position in the enum list
• Days‘Val( 3 ) gives the enum value associated with position 3
• Days‘Pred( today ) gives the predecessor of today
• Days‘Succ( today ) gives the successor of today
Should comparison operations =, <, <=, etc. be allowed?
Should a symbolic constant be allowed to be in more
than one type definition (overloading)?
Is coercion performed to or from enumeration values?
40
Enumeration choices
Pascal
Cannot overload enumeration constants
Enums can be used for array subscripts and case selectors
Enums can be compared
No operations for input or output
C and C++
Can be used like Pascal, but . . .
Coercion, as in “today++” or as in “int n = today”
Operations for input and output as integers
Ada
Can be used as in Pascal, but . . .
Enums may be overloaded
No coercion and allowed ranges are checked
Operations exist for input and output of enumeration values in text form
C#
No coercion and allowed ranges are checked
41
Enumeration type
Evaluation
Aid to readability
Names are easily recognized whereas coded values are not
E.g. – no need to code a color as a number
Aid to reliability
Compiler can check
• Operations on enums
– E.g. – don’t allow colors to be added
• Ranges of allowed values
– E.g. – Ada detects the error in day := Days’Succ( Saturday )
Implementation
Enumeration types are implemented as integers
42
Subrange type
The subrange type is an ordered contiguous
subsequence of an ordinal type
Examples (Ada)
subtype Positive is Integer range 1 .. Integer'Last;
subtype Natural is Integer range 0 .. Integer'Last;
subtype Index is Integer range -1 .. 100;
46
Array storage mapping example
Storage mapping for 2-dim array b
Row-wise allocation is used
Access code for access b[ i, j ] requires j=
2 adds and 2 multiplies 0 1 2 3 4
loc( b[ i, j ] ) 2
= loc( b ) 3
+ w * ( (# elements in previous rows)
+ (# previous elements in row i) ) i= 4
x
= loc( b ) + w*( i * ( # columns) + j ) 5
47
Arrays
Subscript types
FORTRAN, C, and Java
Integer only
Pascal and Ada
Any ordinal type
• Integer, Boolean, Character, enum
Range checking
Java, ML, C# check the range of all subscripts
C, C++, Perl, Fortran do not
Ada checks by default but this can be disabled by a
compiler Pragma
48
Array binding and allocation
We consider the following categories of arrays
Static array
Fixed stack-dynamic array
Stack-dynamic array
Fixed heap-dynamic array
Heap-dynamic array
These are based on when the subscript ranges are
bound and when storage is allocated
49
Array binding and allocation
Static arrays
Range of subscripts and storage bindings are static
e.g. FORTRAN 77, some arrays in Ada, C/C++ static arrays
Advantage
Execution efficiency
No run-time overhead for allocation or deallocation
Fixed stack-dynamic arrays
The range of subscripts is statically bound
Storage is bound at elaboration time
e.g. – most local variable arrays
Advantage: space efficiency descriptor
50
Array binding and allocation
Stack-dynamic arrays
The index range and storage allocation are dynamic,
but fixed from then on for the variable’s lifetime
Advantage: flexibility
Size need not be known until the array is about to
be used
n := <expression>;
E.g. – Ada declare blocks declare
a : array (1..n) of Float;
begin
end;
51
Array binding and allocation
Fixed heap-dyamic arrays
Like stack-dynamic arrays except . . .
Storage allocated on the heap
The index range and storage allocation is initiated by program
request rather than subprogram elaboration
E.g. – all Java arrays
Heap-dynamic arrays
The subscript range and storage bindings are dynamic and may
subsequently be changed
Supported by Smalltalk (e.g. – OrderedCollection), APL,
Pearl, JavaScript, FORTRAN 90, and C# ArrayList class
52
Arrays
Number of subscripts
FORTRAN I allowed up to three
FORTRAN 77 allows up to seven
Other languages - no limit
Array initialization
Some languages permit initialization of arrays
Fortran C
Integer List( 3 )
int list [ ] = { 21, 67, 9 }
Data List / 21, 67, 9 /
Ada “aggregates”
list : array( 1 .. 3 ) of Integer := ( 21, 67, 9 );
list : array( 1 .. 100 ) of Integer := ( 10 => 21, 20 => 67, 30 => 9, others => 0 );
list : array( 1..10, 1..3 ) of Integer := (1 => (1,2,3), 10 => (4,5,6), others => (0, 0,0));
53
Array operations
An array operation operates on an array or a part
of an array as a unit
Ada operations
Assignment
Catenation (1-dim only)
Equality (=) and inequality (/=)
APL
Most powerful array-processing language ever devised
Many array operations
54
Slices
A slice is some substructure of an array
It is nothing more than a referencing mechanism
Slices are only useful in languages that have array
operations
Fortran slices at right
Ada slices below
a : array (1..100) of Float;
a( 1..50 ) := a( 51..100);
55
Associative arrays
An associative array is an unordered collection of data
elements that are indexed by an equal number of values
called keys
Also called a . . .
Map
Key-value table
Dictionary
Perl example
An associative array is called a hash in Perl
Names begin with %
Aggregate literals are delimited by parentheses
E.g. – %temps = ("Monday" => 77,"Tuesday" => 79,…);
Subscripting is done using braces and keys
E.g. – %temps{ "Wednesday“ } = 83;
Elements can be removed with delete
E.g. – delete %temps{ "Tuesday“ };
56
Records
A record is a aggregate of
named data elements of
possibly diverse types
A compile-time descriptor for a
record is at right
The offset is from the record base
address
Design Issues
What is the form of references? a compile-time descriptor
for a record
57
Records
Called the struct data type in C, C++, and C#
A class defines a record in Java and Smalltalk
Record declarations
COBOL uses level numbers to show nested records
Other languages use a recursive definition
Field references
COBOL
<fieldName> OF <recordName2> OF<recordName1>
Other languages use dot notation
<recordName1>.<recordName2>.<fieldName>
58
Records
Fully qualified field references must include all
nested record names
Elliptical references allow leaving out record
names as long as the reference is unambiguous
Pascal provides a with clause to abbreviate
references
59
Record Operations
Assignment
Allowed in Pascal, Ada, and C if the types are identical
In Ada, the RHS can be an record aggregate constant
COBOL uses “MOVE CORRESPONDING”
Moves all fields in the source record to fields with the same
names in the destination record
Initialization
Allowed in Ada, using an aggregate constant
In Java, done by the constructor
Comparison
Ada has tests for equality = and /=
60
Arrays vrs. records
Access to array elements is much slower than
access to record fields
Each record field is accessed with a fixed offset from
the record base address
Array subscripts require run-time calculation
61
Union types
A union is a type whose variables are allowed to store
different type values at different times during execution
Design issue for unions
How should type checking be done?
Examples
Fortran has EQUIVALENCE
No type checking
C and C++ have free unions
Not part of structs
Complete freedom from type checking
Pascal embeds unions in records
Design leads to ineffective type checking
62
Discriminated unions
Algol 68 and Ada use discriminated unions
This provides secure type checking
Ada
Ada embeds discriminated unions in records
One record field in called a discriminant or tag
The discriminant on in the example on the following
slide is Form
63
Ada example type Shape is ( Circle, Triangle, Rectangle );
type Colors is ( Red, Green,Blue );
The discriminant field type Figure( Form : Shape ) is record
Form may not be Filled : Boolean;
Color : Colors;
changed in isolation case Form is
when Circle =>
It may only be Diameter : Float;
when Triangle =>
changed by assigning LeftSide : Ingeger;
to the entire record RightSide : Integer;
Angle : Float;
This prevents the when Rectangle =>
Height : Integer;
record fields from Width : Integer;
becoming end case;
end record;
inconsistent
64
Ada example
Assignment using a record aggregate
Fig : Figure;
Fig := ( Filled => true, Color => Blue, Form => Rectangle, Height => 12, Width => 3 );
65
Pointer types
Pointer type values consist of memory addresses
and the special value nil (or null)
Pointers are used for
Indirect addressing
Management of heap-dynamic variables
These are anonymous variables
66
Pointer operations
Assignment operation
Sets a pointer to a useful address
Dereferencing operation
Interprets the pointer variable as representing the
object at the memory address contained in the pointer
variable
Thus, it applies one level of indirect addressing
Deallocation
Returns the heap-dynamic storage referred to by a
pointer to the system for reallocation
67
Problems with pointers
Dangling pointers
A dangling pointer refers to a heap-dynamic variable
that has been deallocated
To create a dangling pointer in Pascal with explicit
deallocation . . .
Allocate a heap-dynamic variable pointed to by p
Make an alias for the pointer: q := p
Explicitly deallocate the heap-dynamic variable: dispose( p );
Now q contains a dangling pointer
68
Problems with pointers
Lost heap-dynamic variables
A lost heap-dynamic variable is no longer referenced
by any program pointer and is inaccessible
To create a lost heap-dynamic variable . . .
Allocate a heap-dynamic variable pointed to by p
Replace the pointer in p by a reference to some other heap-
dynamic variable: p := q
Now the first heap-dynamic variable is inaccessible
The process of losing heap-dynamic variables is called
memory leakage
69
Pointers in C and C++
Pointers in C and C++ are similar to addresses in
assembly language
Pointers may point virtually anywhere in memory
Pointer arithmetic is possible
Programmer is responsible for avoiding problems
of dangling pointers and lost heap-dynamic
variables
70
Pointers in C and C++
Dereferencing is explicitly specified with the * operator
Reference type variables are constant pointers specified
with the & operator
Reference pointers are always implicitly dereferenced
Used for parameter passing
pass-by-reference
72
Pointers in Java
These are called reference types
Refer to heap-dynamic objects exclusively
No pointer arithmetic
All reference variables are initialized to null
No explicit deallocation
This prevents the dangling pointer problem
All objects are implicitly deallocated by garbage collection
Garbage collection prevents the lost heap-dynamic variable
problem
Reference variables are implicitly dereferenced whenever
the dot notation is used, as in p.link
73
Dangling pointer problem
The problem of dangling pointers can be resolved
using . . .
Tombstones
Locks and keys
74
Tombstones
Tombstone
An extra heap cell that
is a pointer to the
heap-dynamic variable
The actual pointer
variable points only at
a tombstone
When a heap-dynamic
variable deallocated,
the tombstone remains
but set to null
75
Locks and keys
The locks-and-keys technique represents pointer values
as a key-address pair
Each heap-dynamic variable is represented as storage for the
data plus a cell for the key
When heap-dynamic variable allocated, a lock value is
created and a copy is placed in both . . .
A lock cell within the heap-dynamic variable
The key cell of pointer
When a heap-dynamic variable is deallocated, its lock
value is cleared
Every dereference must compare the key value in the
pointer to the lock in the heap-dynamic variable
76
Heap management
Takes deallocation of heap-dynamic variables out
of the hands of programmers
Two popular solutions
Reference counters
Incremental and done when inaccessible cells are created
Garbage collection
Occurs when available heap space runs out
77
Reference counters
The reference counter solution maintains a counter in
every heap cell
The counter stores the number of pointers currently pointing at
the cell
Whenever a pointer is changed . . .
The counter in the old target is decremented
The counter in the new target is incremented
When a counter decrements to zero, the heap-dynamic
variable is returned to the list of available space
Disadvantages
Space required by the reference counters
Time overhead
Complications for cells in circular linked lists
78
Garbage collection
When heap storage is exhausted, perform garbage
collection as follows
Every heap cell has an extra bit used by the garbage
collection algorithm
All bits are initially cleared (assumed to be garbage)
Starting with all program pointers, recursively follow all
pointers and mark any heap-dynamic variable that can
be reached
All variables that remain unmarked are then returned to
the list of available heap cells
79
Garbage collection
Disadvantage
When you need it most, it works the worst
You need it most when there is very little actual garbage left
in the heap
The garbage collection algorithm is very time consuming in
this situation
80
Type checking
Type checking is the activity of ensuring that types are
compatible when considering . . .
the operands of an operator
the parameters and return type of a method
the two sides of an assignment statement
A compatible type is one that is either a legal type or one
that may be coerced to a legal type for the given situation
A coercion is an automatic type conversion that is
allowed under language rules and is implicitly performed
by compiler-generated code
A type error is the use of non-compatible type in a given
situation
81
Type checking
If all type bindings to variables are static, nearly all
type checking can be static
If type bindings are dynamic, type checking must
be dynamic
A programming language is strongly typed if type
errors are always detected
This definition from the text is not the standard
definition
Under this Smalltalk would be strongly typed
The usual definition requires that the single type of
each variable name is known at compile time
82
Strong typing
Advantage
Allows the detection of type errors due to misuse of variables
Language examples:
FORTRAN 77 is not (parameters, EQUIVALENCE)
Pascal is not (only because of variant records)
C and C++ are not
Parameter type checking can be avoided
Unions are not type checked
Ada almost is (UNCHECKED_CONVERSION is loophole)
Java and C# are similar to Ada
They allow explicit casts
83
Strong typing
Coercion rules strongly (and negatively) affect
strong typing
Fortran, C, and C++ are significantly less reliable than
Ada, in which all type conversion is explicit
Java is between C++ and Ada with about half the
assignment coercions of C++
84
Type equivalence
When are variables declared using user-defined types
compatible?
Name type eqivalence means that two variables have
equivalent types when they are declared in the same
declaration or in declarations that use the same typename
Easy to implement but highly restrictive
Ada example
type IndexType is 1..100;
count : Integer;
index : IndexType;
Variables count and index are not compatible
They don’t use the same type name
Assignments count := index; and index := count; are illegal
85
Type equivalence
Structure type equivalence means that two
variables have equivalent types if their types have
identical structures
More flexible, but harder to implement
The entire structures of both types must be compared
Are two record (structure) types equivalent if they have the
same structure but different field names?
Are two array types equivalent if the subscript ranges are
different?
It is not possible to distinguish between types with the
same structure which represent different kinds of data
How can you avoid mixing counts of apples and oranges if
they are both integer types?
86
Ada examples
Ada usually requires name type equivalence but avoids
most restrictions by having derived types and subtypes
Derived types
A different type that has the same structure as a base type
Example of incompatible derived types
type Celsius is new Float;
type Fahrenheit is new Float;
Subtypes
A possibly range-constrained version of a base type
Example
subtype IndexType is Integer range 1..100;
count : Integer;
index : IndexType;
Variables count and index are now compatible
87
Ada examples
Ada uses structure type equivalence for “unconstrained
array” types
vec1 and vec2 are equivalent
type Vector is array( Integer range <>) of Float;
vec1 : Vector( 1..10 ):
vec2 : Vector( 11.. 20 );
Care must be taken with “constrained” anonymous types
A and B are incompatible
A : array( 1..10 ) of Integer;
B : array( 1..10 ) of Integer;
A and B are still incompatible
A, B : array( 1..10 ) of Integer;
88
C and C++
C uses structure type equivalence for all types
except struct, enum, and union
Except if two structures or unions are defined in
different files
Then structure type equivalence is again used
C++ uses name type equivalence
typedef in C and C++ simply creates an alias for a
type
89