Operating Systems

Operating Systems
Jake Wright
November 7, 2013
Contents
1 Introduction to operating systems
1.1 Abstract view of an operating system . . . . . .
1.1.1 Multiprogramming . . . . . . . . . . . . .
1.1.2 Time-sharing . . . . . . . . . . . . . . . .
1.1.3 Monolithic operating systems . . . . . . .
1.1.4 Dual-mode operation . . . . . . . . . . . .
1.1.4.1 Real mode and protected mode .
1.2 Bits and bytes . . . . . . . . . . . . . . . . . . .
1.2.1 Words . . . . . . . . . . . . . . . . . . . .
1.3 Endianness . . . . . . . . . . . . . . . . . . . . .
1.4 Memory . . . . . . . . . . . . . . . . . . . . . . .
1.5 Protecting I/O, memory and the CPU . . . . . .
1.5.1 Protecting memory . . . . . . . . . . . . .
1.5.2 Protecting the CPU . . . . . . . . . . . .
1.6 Kernels and microkernels . . . . . . . . . . . . .
2 Processes
2.1 Process creation . . . . . .
2.1.1 The fork system call
2.2 Process termination . . . .
2.3 Process hierarchies . . . . .
2.4 Processes states . . . . . . .
2.5 Implementation of processes
2.5.1 Process control block
2.5.2 Interrupt vectors . .
2.6 Threads . . . . . . . . . . .
2.7 The boot process . . . . . .
2.7.1 Booting UNIX . . .
2.7.2 Booting Windows .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
8
8
9
9
10
10
10
11
11
12
12
.
.
.
.
.
.
.
.
.
.
.
.
13
13
14
14
15
15
15
15
16
17
17
17
18
Contents
3 Scheduling
3.1 Context switch . . . . . . . . . . . . . . . . .
3.1.1 Heavyweight and lightweight processes
3.2 Process behaviour . . . . . . . . . . . . . . .
3.3 When to schedule . . . . . . . . . . . . . . . .
3.4 Pre-emptive and non-pre-emptive schedulers .
3.5 Idle system . . . . . . . . . . . . . . . . . . .
3.6 Scheduling algorithm goals . . . . . . . . . .
3.7 Algorithms . . . . . . . . . . . . . . . . . . .
3.7.1 First-come first-served . . . . . . . . .
3.7.2 Shortest job first . . . . . . . . . . . .
3.7.3 Shortest remaining time first . . . . .
3.7.4 Round-robin . . . . . . . . . . . . . .
3.8 Static and priority scheduling algorithms . . .
3.8.1 Static priority scheduling . . . . . . .
3.8.2 Dynamic priority scheduling . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Memory management
4.1 Basic memory management . . . . . . . . . . . . . . .
4.1.1 Monoprogramming without swapping or paging
4.1.2 Multiprogramming with fixed partitions . . . .
4.2 The address binding problem . . . . . . . . . . . . . .
4.2.1 Compile time . . . . . . . . . . . . . . . . . . .
4.2.2 Load time . . . . . . . . . . . . . . . . . . . . .
4.2.3 Run-time . . . . . . . . . . . . . . . . . . . . .
4.3 Mapping logical to physical addresses at run-time . . .
4.4 Dynamic Partitioning . . . . . . . . . . . . . . . . . .
4.4.1 Compaction . . . . . . . . . . . . . . . . . . . .
4.5 Virtual Memory . . . . . . . . . . . . . . . . . . . . .
4.5.1 Paging . . . . . . . . . . . . . . . . . . . . . . .
4.5.1.1 Page tables . . . . . . . . . . . . . . .
4.5.1.2 Page table structure . . . . . . . . . .
4.5.1.3 Paging pros and cons . . . . . . . . .
4.5.1.4 Multilevel page tables . . . . . . . . .
4.5.1.5 Structure of a page table entry . . . .
4.5.2 TLBsTranslation Lookaside Buffers . . . . . .
4.5.3 Inverted page tables . . . . . . . . . . . . . . .
4.5.4 Shared pages . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
19
19
20
20
21
21
22
22
23
23
24
24
24
24
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
28
28
28
28
28
29
29
29
31
31
31
32
33
33
33
35
35
36
36
Contents
4.5.5
4.5.6
4.5.7
4.5.8
Page replacement algorithms . . . . . . . . . . . . . . . . . . . . .

4.5.5.1 The optimal page replacement algorithm . . . . . . . . .
4.5.5.2 The not recently used page replacement algorithm . . . .
4.5.5.3 The first-in, first-out (FIFO) page replacement algorithm
4.5.5.4 The second chance page replacement algorithm . . . . . .
4.5.5.5 The clock page replacement algorithm . . . . . . . . . . .
4.5.5.6 The least recently used (LRU) page replacement algorithm
Simulating LRU in software . . . . . . . . . . . . . . . . . .
4.5.5.7 The working set page replacement algorithm . . . . . . .
Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Paging versus segmentation . . . . . . . . . . . . . . . . . . . . . .
Implementation of segmentation . . . . . . . . . . . . . . . . . . .
4.5.8.1 Paged segments . . . . . . . . . . . . . . . . . . . . . . .
4.5.8.2 Software segments . . . . . . . . . . . . . . . . . . . . . .
5 I/O
5.1
5.2
5.3
subsystem
I/O Devices . . . . . . . . . . . . . . . . . . . .
Memory Mapped I/O . . . . . . . . . . . . . .
Data transfer from a device driver point of view
5.3.1 Polled I/O . . . . . . . . . . . . . . . .
5.3.2 Interrupt-driven I/O . . . . . . . . . . .
5.3.3 Direct memory access (DMA) . . . . . .
5.4 Data transfer from a users point of view . . . .
5.4.1 Blocking . . . . . . . . . . . . . . . . . .
5.4.2 Non-blocking . . . . . . . . . . . . . . .
5.4.3 Asynchronous . . . . . . . . . . . . . . .
5.5 I/O Buffering . . . . . . . . . . . . . . . . . . .
5.6 Caching . . . . . . . . . . . . . . . . . . . . . .
5.6.1 Temporal locality of reference . . . . . .
5.6.2 Spatial locality of reference . . . . . . .
6 File systems
6.1 Files . . . . . . . . . . . . .
6.1.1 File names . . . . .
6.1.2 File meta-data . . .
6.1.3 File operations . . .
6.1.4 Hardlinks . . . . . .
6.1.5 Symbolic (soft) links
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
37
38
38
38
38
39
39
40
41
41
42
42
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
44
44
44
44
44
45
46
46
47
47
48
48
48
.
.
.
.
.
.
49
49
49
49
50
50
51
Contents
6.2
6.3
6.1.6 Password files . . . . . . . . . . . . .

6.1.7 Special files . . . . . . . . . . . . . .
Directories . . . . . . . . . . . . . . . . . . .
File system implementation . . . . . . . . .
6.3.1 Unix file system . . . . . . . . . . .
6.3.1.1 Opening a file . . . . . . .
6.3.1.2 Access Control . . . . . . .
Directory permissions . . . .
6.3.1.3 Disk layout . . . . . . . . .
6.3.1.4 Pipeline . . . . . . . . . . .
6.3.2 Windows file system . . . . . . . . .
6.3.2.1 File allocation table (FAT)
6.3.2.2 NTFS . . . . . . . . . . . .
6.3.3 Mounting file systems . . . . . . . .
7 Protection
7.1 Authentication . . . . . . . . .
7.1.1 One-way functions . . .
7.1.2 Salts . . . . . . . . . . .
7.2 Trojan Horses . . . . . . . . . .
7.3 Viruses . . . . . . . . . . . . .
7.3.1 Memory resident viruses
7.3.2 Boot sector viruses . . .
7.4 Worms . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
52
52
53
53
53
55
55
55
56
56
56
57
58
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
62
62
63
63
63
64
64

A modern computer is a complex system: writing programs to keep track of every component and use them optimally is a difficult job. For this reason, computers have a layer
of software called the operating system, whose job it is to manage all of the computers
devices and provide user programs with a simpler interface to the hardware.
The objectives of an operating system are:
To provide convenience to user programs
To enable the system to be efficient
To allow extensibility
1.1 Abstract view of an operating system
Figure 1.1: An abstract view of the operating system sitting in-between the hardware
layer and the user program layer.

The operating system:
Controls all execution
Multiplexes resources between applications
Abstracts away from complexity
1.1.1 Multiprogramming
Originally, when the current job paused to wait for an I/O operation to complete, the
CPU would sit and wait for it. With I/O bound jobs, this can waste a lot of time.
The solution was multiprogramming which involves partitioning memory into several
pieces, with a different job in each partition. While one job was waiting for I/O to
complete, another job could be using the CPU.
1.1.2 Time-sharing
Third generation operating systems were well suited for big scientific calculations, but
programmers wanted to be able to have the machine to themselves for a few hours, so
they could debug their programs quickly. This desire for quick response times paved
the way for time-sharing, a variation of multiprogramming, in which each user has an
online terminal.
1.1.3 Monolithic operating systems

There is no structure in a monolithic operating system. The operating system is a
collection of procedures, each of which can call any of the other ones whenever it needs to.
This structure has some problems: applications can modify the OS and other applications
in harmful ways and hog CPU time.
1.1.4 Dual-mode operation

To stop buggy or malicious programs from harming the system, hardware is used to
provide two mode of operation: user mode and kernel mode or supervisor mode.
The supervisor mode is used by the operating systems kernel for low level tasks that
need unrestricted access to the hardware, such as controlling how memory is written,
1.2 Bits and bytes

and communication with devices like graphics cards. User mode, in contrast, is used for
almost everything else.
Entering kernel mode is, of course, forbidden. To do something that is forbidden in user
mode, a user program must make a system call, which traps into the kernel and invokes
the operating system. The TRAP instruction switches from user mode to kernel mode
where the operating system can take over. When the work has been completed, control
is returned to the user program at the next instruction.
Computers have traps other than the instruction for executing a system call. Most of the
other traps are caused by the hardware to warn of an exception such as an attempt to
divide by zero. In all cases, the operating system gets control and must decide what to
do. Sometimes the program must be terminated. Sometimes, the program can request to
handle the exception, in which case control can be passed back to let the program deal with
the problem.
1.1.4.1 Real mode and protected mode
Only modern operating systems implement dual-mode operation; in the past, an operating system would run everything in what we now call kernel mode or supervisor mode.
In an x86 CPU, this is known as real mode. For backwards compatibility, processors
start in real mode. When running in real mode, everything has full access to the system.
A modern operating system will switch the CPU from real mode into protected mode
during the boot process. Once an x86 CPU is in protected mode, various protection rings
become available. Each rings offers a different level of access to the system, ranging from
Ring 0 (kernel mode) to Ring 3 (user mode). Rings 1 and 2 are rarely used.
Real mode is unnecessary in modern operating systems because programs running in
Ring 0 (kernel mode) in protected mode have access to anything in the system but it is
needed to boot older operating systems which only ran in real mode.
1.2 Bits and bytes

Before going any further, it is important to understand the units summarised below.
A bit is a binary digit: either a 1 or a 0
A nibble is 4 bits (half a byte)

A byte is equal to 8 bits
A word is typically 4 bytes and therefore 32 bits, more on this below
1.2.1 Words
A word refers to the size of the processors registers. That is, a word on a 32 bit processor
is 4 bytes (32 bits) but on a 64 bit processor, a word will be 64 bits long. A word is
therefore the the number of bits that the processor can handle as a single unit.
1.3 Endianness
Endianness refers to the order of bytes in a word. A big-endian system stores the most
significant byte first while a little-endian system stores the least significant byte first.
Consider the decimal number 1000. It is common to represent numbers in hexadecimal
(base 16) because each hexadecimal digit represents four bits (so 2 hex digits represent
one byte), so lets first convert to that.
n
1000
62
3
n DIV 16
62
3
0
n MOD 16
8
14 (E)
3
1000 in decimal is therefore equal to 3E8 in hexadecimal (1410 = E16 ).

In a big-endian system, 03E8h would be stored as 03 E8. As you can see, this is the order
we write numbers on paper.
In a little-endian system, 03E8h would be stored as E8 03, with the least significant byte
first.
1.4 Memory
Because faster memory is more expensive, only a small amount of it is used. We then
encounter a hierarchy of memory, getting slower but larger in size as we move away from
the CPU.
10
1.5 Protecting I/O, memory and the CPU

At the top, we have registers. These are inside the CPU and are the fastest form of
memory. The storage capacity is typically 32 32-bits on a 32-bit CPU and 64 64-bits
on a 64-bit CPU. Less than 1kB in both cases. Programs must manage the registers
themselves, in software.
Next is the cache memory, which is mostly controlled by hardware. Main memory is
divided up into cache lines. The most used cache lines are kept in a high-speed cache
located inside the CPU. When a program needs to read a memory word, the cache
hardware checks to see if the line needed is in the cache. If it is, it is called a cache hit.
Cache hits normally take about two clock cycles. Cache misses have to go to memory,
which takes a lot longer. Cache memory is limited in size due to its high cost. Some
machines have two or three levels of cache, each one slower but bigger than the one before
it.
Main memory, or random access memory (RAM), is next. All CPU requests that cannot
be satisfied by the cache go to main memory.
Next in the hierarchy is a hard disk. Disk storage is much cheaper and larger than RAM
but is many orders of magnitude slower. The low speed is due to the fact that a disk is
a mechanical device.
1.5 Protecting I/O, memory and the CPU

1.5.1 Protecting memory
It is often desirable to hold multiple programs in memory at once. When doing this, two
problems must be solved:
1. How to protect the programs from one another and the kernel from them all
2. How to handle relocation
Many solutions are available, but they all involve special hardware in the CPU.
Lets look at the first problem. This can be solved by adding two registers - the base
register and the limit register. When a program is run, the base register is set to point
to the start of its program text and the limit register tells us how large the combined
program text and data are. When an instruction is fetched, the hardware checks to see if
the program counter is less than the limit register, and if it is, adds it to the base register
11

to it and sends the sum to memory. These registers make it impossible for a program to
reference memory addresses outside of its own range.
Now lets consider the second problem. The problem of relocation is caused by running
the same code from different memory locations. The compiler does not know where the
program will run in memory. This is also known as the address binding problem and will
be looked at in greater detail later.
1.5.2 Protecting the CPU

As well as protecting memory, it is necessary to ensure that a program doesnt hog the
CPU. A timer is used to do this. A timer can force the OS into kernel mode where it can
take control and schedule another process.
1.6 Kernels and microkernels

The kernel has already been mentioned - it is the main component of an operating system
with the function of managing the computers resources and allowing other programs to
use these resources.
A different approach is to implement a microkernel. A microkernel is the minimum
amount of software that can provide the mechanisms needed to implement an OS. Many
services such as device drivers, protocol stacks and file systems are removed from the
kernel to run in user space.
Microkernels were important in the 80s when the first usable local area networks were
being introduced. The same techniques that allowed the kernel to be moved largely into
user space also allowed the system to be distributed across network links.
12
2 Processes
A process is a program in execution. Each program has its own address space: a list of
memory locations from some minimum (usually 0) to some maximum, which the process
can read and write to. The address space contains the executable program, the programs
data, and its stack. Also associated with each process is a set of registers, including the
program counter, stack pointer, and other hardware registers. Conceptually, each process
has its own virtual CPU. In reality, the CPU switches back and forth from process to
process (this rapid switching back and forth is multiprogramming).
2.1 Process creation

There are four events that cause processes to be created:
1. System initialisation
2. Execution of a process creation system call by a running process
3. A user request to create a new process
4. Initiation of a batch job
Several processes are created when the operating system is booted. Some of these are
foreground processes, that is, processes that interface with (human) users. Others are
background processes, which are not associated with particular users, but instead have
some specific function. For example, one background process may be designed to accept
incoming email. These processes are called daemons.
In addition to the processes created at boot time, new processes can be created afterwards
as well. Often a running process will issue system calls to create one or more new processes
to help it do its job.
In interactive systems, users can start a program by typing a command or (double)
clicking an icon.
13
2 Processes
The last situation in which processes are created applies to batch systems. Here users
can submit batch jobs to the system. When the operating system decides that it has the
resources to run another job, it creates a new process and runs the next job from the
queue.
2.1.1 The fork system call

In UNIX, the system call fork() is used to create a new process. It takes no arguments
and creates an exact copy of the calling process. It then returns a process ID to both
processes. After the fork, the two processes (the parent and the child), have the same
memory image (but separate address spaces) and the same open files.
If fork() returns a negative value, the creation of the child process was unsuccessful
fork() returns zero to the child process
fork() returns the process ID of the child process to the parent
Both the parent and the child will execute the same instruction following the fork() call.
Based on the return value of fork(), a test can be done to determine whether the process
is a parent or a child.
Usually, the child process then uses a system call to change its memory image and run a
new program.
2.2 Process termination

A process will usually terminate due to one of the following conditions:
1. Normal exit (voluntary)
2. Error exit (voluntary)
3. Fatal error (involuntary)
4. Killed by another process (involuntary)
Most processes terminate because they have done their work. The second reason for
termination is that the process discovers a fatal error. The third reason is if the process
causes an error, for example if it tries to divide by zero. Finally, a process might execute a
system call telling the operating system to kill some other process. in UNIX, this system
14
2.3 Process hierarchies

call is kill(), the corresponding Win32 function is TerminateProcess(). In both cases, the
killer must have the necessary privilege to kill the process.
2.3 Process hierarchies

In UNIX, a process and all of its children and further descendants together form a process
group. When a user sends a signal from the keyboard, the signal is delivered to all
members of the process group currently associated with the keyboard (usually all active
processes that were created in the current window). Individually, each process can catch
the signal, ignore the signal, or take the default action, which is to be killed by the signal.
In UNIX, all the process in the whole system belong to a single tree, with init at the
root. In contrast, Windows does not have any concept of a process hierarchy. When a
process is created, the parent is given a special token (called a handle) that it can use to
control the child. However, it is free to pass this token to some other process. Processes
in UNIX cannot disown their children.
2.4 Processes states

As a process executes, it changes state. A process can be in any of the following states:
1. New - the process is being created
2. Running - instructions are being executed
3. Ready - runnable but waiting for the CPU
4. Blocked - waiting for an event to occur
5. Exit - The process has finished execution
2.5 Implementation of processes

2.5.1 Process control block
The operating system maintains information about every process in a data structure
called a process control block (PCB). Each PCB contains information about the process
15
2 Processes
Figure 2.1: A process can be in running, blocked, or ready state. Processes entering the
system are in the new and exit states.
state, its program counter, stack pointer, memory allocation, the status of its open files,
its accounting and scheduling information, and everything else about the process that
must be saved when the process is switched from running to ready or blocked state so
that it can be restarted later as if it had never been stopped.
2.5.2 Interrupt vectors

An interrupt vector is either the memory address of an interrupt handler, or an index
into an interrupt vector table containing the interrupt handlers memory addresses. When
an interrupt occurs, the processor saves the state of the current processor by pushing the
registers values onto the stack, and then loads the interrupt vector into the program
counter.
All interrupts start by saving the registers, often in the process control block for the
current process. Once the interrupt routine has finished, the scheduler is called to see
who to run next. Interrupt handling and scheduling can be summarised as follows:
1. Assembly language procedure saves registers
2. Hardware loads new program counter from interrupt vector
3. Assembly language procedure sets up new stack
4. Interrupt executes
5. Scheduler decides which process is to run next
16
2.6 Threads
6. Assembly language procedure starts up new current process
2.6 Threads
So far, we have assumed that each process executes separately from the others and each
process has its own address space. However, sometimes it is useful to split processes up
and having each part running in parallel, but in the same address space. To implement
this idea, we introduce the concept of threads.
In a system using the tread model, a process can be thought of as a group of resources.
A process still has its own virtual address space, but now contains threads. Each thread
has its own scheduling state (ready, running, blocked, etc.), registers and stack.
2.7 The boot process

So far we have looked at some fundamental concepts found in operating systems, the
hardware needed by an operating system and we have discussed how the operating system
deals with processes. We have not looked at how an operating system is loaded in the
first place or how an operating system gets to the state where it can perform normal
operations. This stage is called the booting.
When an IBM-PC is first turned on, the processor executes, in real mode, the BIOS,
which is stored on ROM. The BIOS program runs a power-on self-test (POST) to check
and initialise hardware devices in the system.
After the POST sequence, the BIOS looks for a bootable device. A bootable device is a
device where the last two bytes of the first sector contain the little-endian word AA55h.
This is called the boot signature.
Once a bootable device is found, the boot sector of the device is loaded into memory
address 7C00h and is then executed. From here, the processor can enter protected mode
and the kernel can be located on disk and loaded into memory.
2.7.1 Booting UNIX

Once the UNIX kernel has been loaded, the main procedure can be executed. This is
written in C. Device drivers are loaded, the root file system is mounted and the init
17
2 Processes
process is executed. The init process reads the file /etc/inittab to find out whether the
system should enter single user mode or multi-user mode and then starts background
processes.
In multi-user mode, init forks a getty process for each terminal listed in /etc/ttys. Getty
prints the login prompts on screen. When somebody supplies a username, /bin/login is
executed which asks for the password and authenticates the user. If authentication is
successful, login forks the users shell.
2.7.2 Booting Windows

The Windows NT boot process is used by Windows NT 4.0, Windows 2000, Windows
XP and Windows Server 2003. Older and newer versions of Windows have different boot
processes but we will only discuss the NT process here.
Once the boot sector has been loaded into memory, the code loads a file called NTLDR,
which then reads boot.ini. The boot.ini file holds various boot-related settings including
the version of Windows to boot. NTLDR loads the NT kernel (Ntoskrnl.exe) and the
Hardware Abstraction Layer (hal.dll ). Then it reads the registry to find which drivers
need loading. Finally, the session manager (smss.exe) is executed.
The session manager starts the Win32 subsystem, creates virtual memory paging files
and starts the Windows Logon Manager (winlogon.exe). If the user successfully logs on,
the Logon Manager starts the users shell, typically Explorer.exe.
18
3 Scheduling
The part of the operating system that decides which process to run next is called the
scheduler. The scheduler has an important decision to make each time a new process
it to be run. The chosen process has a large effect on the perceived performance of the
system and, in addition to to picking the right process to run, the scheduler also has to
worry about making efficient use of the CPU because process switching is expensive.
3.1 Context switch

A context switch is the process of switching from user space to kernel space or vice versa.
An interrupt (including those from a schedulers timer) causes a context switch into kernel
mode. The current state of the CPU (the context) is stored so that the process can be
resumed. Context switches come with a large overhead so it is desirable to minimise the
number of context switches that occur.
3.1.1 Heavyweight and lightweight processes

Processes that all execute in the same address space, and do not have to switch to
the kernel address space on a context switch, are called lightweight processes. It
is inexpensive to context switch between lightweight processes because they have little
state. A heavyweight processes, on the other hand, have separate address spaces and
therefore have to also switch to the kernel address space on a context switch. UNIX uses
heavyweight processes and context switches are therefore relatively expensive.
3.2 Process behaviour

Nearly all processes alternate bursts of computing with I/O requests. Typically, the CPU
runs for a while, then a system call is made to read/write from/to a file. Processes can
19
3 Scheduling
be described as either:
I/O-bound - spends more time doing I/O than computation; has many short CPU
bursts.
CPU-bound - spends more time doing computations; has a few very long CPU
bursts.
3.3 When to schedule

A key issue related to scheduling is when to make a scheduling decision. First, when a
new process is created, a decision need to be made whether to run the parent process or
the child process, since both processes are in the ready state.
Second, a scheduling decision must be made when a process exits. That process can no
longer run so another one must be chosen. If no process is ready, a system-supplied idle
process is normally run.
Third, when a process blocks on I/O, another process has to be selected to run. Sometimes, the reason for blocking may play a role in the choice. For example, if A is an
important process and is waiting for B, letting B run will allow A to continue sooner.
The scheduler generally does not have the information to take this dependency into account, though.
Fourth, when an I/O interrupt occurs, a scheduling decision may be made. If a process
was waiting for the I/O device that has just finished, it may be ready to run. It is up to
the scheduler to decide whether it should be run or not.
If a hardware clock provides periodic interrupts, a scheduling decision can be made at
each one or at every k-th interrupt.
3.4 Pre-emptive and non-pre-emptive schedulers

A non-pre-emptive scheduling algorithm picks a process to run and then lets it run until
its finished. It cannot guarantee that an interrupt will occur to allow kernel space
to be entered and a scheduling decision to be made. A pre-emptive scheduler, on the
other hand, can guarantee that an interrupt will occur, by setting up timers. With a
20
3.5 Idle system

pre-emptive scheduler, a process will only run for a fixed length of time, and then a
scheduling decision will be made and another process may be chosen to run instead.
3.5 Idle system

As mentioned above, if no processes are ready to run, a system-supplied idle process is
usually run instead. This is not necessarily the best solution though. Other options are
available. The operating system could:
Halt the processor
X Saves power
X Increases processor lifetime
Might take too long to stop and start
Busy wait in the scheduler (repeatedly check to see if a condition is true)
X Quick response time
Ugly and wasteful
Run the idle process as mentioned above
X Gives uniform structure
X Can be used to run checks
Uses some memory
Can slow interrupt response
3.6 Scheduling algorithm goals

In order to design a scheduling algorithm, it is necessary to have some idea of what a
good algorithm should do.
Under all circumstances, fairness is important. Comparable processes should get comparable service.
21
3 Scheduling
Enforcing the systems policies is also important. If the local policy is that safety
control processes get to run whenever they want to, the scheduler has to make sure this
policy is enforced.
Another goal is keeping all parts of the system busy. If the CPU and all I/O devices
can be kept running all the time, more work gets done per second.
Throughput is the number of jobs per hour that the system completes. It is desirable
to maximise this.
Turnaround time is the average time taken to complete a job. It measures how long the
average user has to wait for the output. It is desirable to minimise this. In an interactive
system, this is known as response time. That is, the time between issuing a command
and getting the result.
Finally, it is necessary to minimise the waiting time, i.e. the amount of time a process
has been waiting in the ready queue.
Sensible scheduling strategies might be to:
Maximise throughput or CPU utilisation
Minimise average turnaround time, waiting time and response time
3.7 Algorithms
There are four main scheduling algorithms that we will discuss. Note that a burst time is
the time between scheduling and a process giving up its time slice. When a process gives
up its time slice, it is known as yielding. That is, it is essentially saying I have nothing
useful to do, please schedule something else.
3.7.1 First-come first-served

Probably the simplest scheduling algorithm. It is non-pre-emptive and the CPU runs
processes in the order in which they become ready. There is a single first in-first out
queue of ready processes from which the scheduler takes jobs. This is a very simple, and
often fair, algorithm, but unfortunately features the convoy effect. That means that jobs
with short burst times (use little CPU and then have to wait for I/O) get stuck behind
processor intensive tasks and subsequently take a long time to complete.
22
3.7 Algorithms
3.7.2 Shortest job first

This is another non-pre-emptive algorithm that, as the name suggests, assumes run times
of processes are known in advance. The next burst time of the processes is used to choose
which one to run. This gives a better average waiting time for a set of processes than
first-come first-served but how do we know which one will be the quickest to execute?
We could store information about every process that has previously run, including the
burst time, but that would be a lot of extra data to store, and often the same process
can run very differently depending on the input parameters. In a real world system, this
is an impossible algorithm. It is not possible to compute the running time of a process
before executing it.
3.7.3 Shortest remaining time first

A preemptive version of the shortest job first algorithm is shortest remaining time first.
For example, if a new process arrives with a CPU burst length of less than the remaining
time of the current executing process, the current process is interrupted and a context
switch occurs. The queue of ready jobs is ordered by shortest remaining timethis is
known as a priority queue. New jobs can be inserted into the middle of the queue.
The next burst time can be estimated based on the last burst time and history before
that. This is known as an exponentially-weighted moving average and is calculated
as follows.
Tn+1 = Tn + (1 )Tn(old)
A sensible value for must be chosen.
If is low more weight on history.
If is too low, the burst time will average over all of history and not be useful.
A solution to this would be to only store history for a certain amount of time. For
example, only store the last 100 burst times.
If is high only looks at last burst.
A value of 1 is chosen if we believe the history to be irrelevant.
If is too high, however, it makes the algorithm too perceptible to noise.
is typically about 0.5.
23
3 Scheduling
3.7.4 Round-robin
Round-robin is another pre-emptive algorithm. Each process is given a time-slice (called
a quantum), after which the process is put on the back of the queue and the next process
is scheduled from the front of the queue. If the quantum size is too large, the algorithm
tends towards simulating FCFS. If it is too small, the context switch overhead is too high
to make the algorithm efficient. A quantum size is typically 10ms.
Round-robin is far - every process gets 1/nth of the CPU.
No process is kept waiting for longer than any other process.
Typically has a higher average turnaround time that SRTF, but has a better average
response time.
3.8 Static and priority scheduling algorithms

The four algorithms discussed already are not the only options, they just make a nice
group of four because they are all related in some way, albeit indirectly in some cases.
3.8.1 Static priority scheduling

Each process is given a priority when it is created.
The process keeps this priority for the duration of its life.
The process with the highest priority is always chosen first.
When processes have the same priority, apply one of the four basic scheduling algorithms
already discussed.
Starvation is a problem with static priority scheduling. This occurs when a constant
stream of high priority jobs arrive and get scheduled, meaning a low priority process will
never run. The solution to this is to have dynamic priorities.
3.8.2 Dynamic priority scheduling

With this algorithm, the priority will change with the lifetime of a process. The longer a
process waits, the higher the priority should be.
Processes have a (static) base priority and a dynamic effective priority.
If process starved for k seconds, increment dynamic priority.
24
3.8 Static and priority scheduling algorithms

Once process runs, reset effective priority (or all priorities will tend to infinity). Alternatively, the priority can be reduced by an amount that is proportional to the burst
time.
For dynamic priority scheduling to be useful, it needs to be pre-emptive.
25
4 Memory management
Many processes are held in memory simultaneously, and every process has, in memory:
A code segment (or a text segment) contains the programs instructions
A data segment read-write contains global and static variables
A stack parameters, automatic and temporary variables
A heap dynamically allocated variables
The operating system also needs memory for itself. It is up to the memory management
subsystem to handle:
1. Relocation
2. Allocation
3. Protection
4. Sharing
5. Logical organisation
6. Physical organisation
4.1 Basic memory management

Memory management systems can either move processes back and forth between main
memory and disk during execution, or they can just leave a process in main memory until
it has finished.
27
4 Memory management
4.1.1 Monoprogramming without swapping or paging

The simplest scheme is to run just one program at a time, sharing the memory between
that program and the operating system.
4.1.2 Multiprogramming with fixed partitions

The easiest way to achieve multiprogramming is to divide memory up into n (possibly
unequal) partitions. When a job arrives, it can be put into the queue for the smallest
partition large enough to hold it. Since the partitions in this scheme are fixed, any space
in a partition not used by a job is lost. This is known as internal fragmentation. On
the other hand, if a process needs more space than the partition can offer, the partition
cannot grow.
4.2 The address binding problem

The address binding problem concerns how we map symbols in program code to the
address space in which we want to run. That is, the actual memory addresses that the
program will use are not known before execution.
There are three opportunities to address this problem.
4.2.1 Compile time

We can let the compiler choose where the program should run. The program will use the
same memory addresses every time it executes. In a shared address space, two programs
using the same addresses cannot run at the same time.
4.2.2 Load time

When a program is loaded, the position in memory can be calculated and all of the
relevant code can be updated with the correct addresses. This must be done every time
the program is loaded, though.
28
4.3 Mapping logical to physical addresses at run-time
4.2.3 Run-time
Hardware can be used to automatically translate between program addresses and real
addresses at run-time. No changes are required to the program code. Providing the
necessary hardware is available, this is the most flexible scheme.
4.3 Mapping logical to physical addresses at run-time

As mentioned in the previous section, the address binding problem can be solved at
run-time. The memory management unit (MMU) maps logical addresses to physical
addresses.
1. Relocation register holds the value of the base address owned by the process
2. Relocation register contents are added to each memory address before it is sent to
memory
The process never sees the physical address and simply manipulates the logical addresses.
The OS, however, has the privilege to update the relocation register.
As discussed earlier, programs must be protected from each other in memory. To do this,
we can use a base register and a limit register in the memory management unit. We can
then use those registers to:
1. Map logical addresses to physical addresses
2. Ensure programs dont access memory addresses outside of their given range
4.4 Dynamic Partitioning

It is more flexible if we allow the partitions sizes to be dynamically chosen. The operating
system keeps track of which areas of memory are available and which are occupied. Linked
lists are used to do this.
When a new process arrives into the system, the OS searches for a hole large enough to
fit the process. There are different algorithms that can be used to determine which hole
to use for a new process:
First fit - stop searching as soon as a big enough hole is found
29
4 Memory management
Best fit - search entire list to find the best fitting hole, i.e. the smallest hole which
is large enough
worst fit - counterintuitively allocate the largest hole (again, searching the whole
list) to avoid filling memory with tiny, useless holes
Allocating more memory than necessary would be done if a process data segment is
expected to grow. When a process is swapped in or out of memory, it is wasteful to
also swap extra memory so only the memory actually in use should be swapped. If
processes can have two growing segments, for example, the data segment (used as a heap
for variables that are dynamically allocated and released) and a stack segment (for the
normal local variables and return addresses) an alternate arrangement, as shown in figure
4.1, is convenient. We can have the program, and then the data segment, and then the
stack segment, with extra space allocated in between the data and the stack segments,
which can be used by either one. The data segment can grow up into the space while the
stack segment can grow down.
When a process terminates, its memory returns to the free list, coalescing holes together
where appropriate.
Figure 4.1: The structure of a process address space with extra space allocated to allow
the data and stack segments to expand.
External fragmentation can occur with dynamic partitioning. This is when the total
available memory is sufficient but it is unusable because it is split into many holes. The
solution to this is compaction.
30
4.5 Virtual Memory
4.4.1 Compaction
When swapping creates multiple holes in memory, it is possible to combine them all into
one big one by moving all the processes downward as far as possible. This technique is
known as memory compaction. It is usually not done because it requires a lot of CPU
time.It also requires run-time relocation.
4.5 Virtual Memory

The basic idea behind virtual memory is that the combined size of the program, data,
and stack may exceed the amount of physical memory available for it. The operating
system keeps those parts of the program currently in use in main memory and the rest
on the disk. Virtual memory can also be used in a multiprogramming system, with parts
of multiple programs in memory at once.
4.5.1 Paging
Addresses can be generated by a program using indexing, base registers, segment registers,
and other ways. These program-generated addresses are called virtual addresses and
form the virtual address space. On computers without virtual memory, the virtual
address is put directly onto the memory us and causes the physical memory word with the
same address to be read or written. When virtual memory is used, the virtual addresses
do not go directly to the memory bus. Instead, they go to the Memory Management
Unit (MMU) that maps the virtual addresses onto the physical memory addresses.
The virtual address space is divided up into units called pages. The corresponding units
in the physical memory are called page frames. The pages and page frames are always
the same size (typically 4kB or 8kB. In hardware, a Present/absent bit keeps track
of which pass are physically present in memory. If a program tries to use an unmapped
page, the MMU causes the CPU to trap to the operating system. This trap is called a
page fault. The OS picks a little-used page frame and writes its contents back to the
disk. It then fetches the page just referenced into the page fame just freed, changes the
map, and restarts the trapped instruction.
31
4 Memory management
Figure 4.2: The internal operating of the MMU with 16 4KB pages.
4.5.1.1 Page tables

It is necessary to keep a map of all of the pages and where they are in main memory.
As seen in figure 4.2, each memory address is split into a virtual page number (high-order
bits) and an offset (low-order bits). In this example, the upper 4 bits specify one of 16
virtual pages and the lower 12 bits specify the byte offset (0 to 4095) within the selected
page.
32
4.5 Virtual Memory

4.5.1.2 Page table structure
The simplest design is to have a single page table consisting of an array of fast hardware
registers with one entry for each virtual page. When a process is started, the OS loads
the registers with the process page table, taken from a copy kept in main memory.
This method is straightforward and requires no memory references during mapping. A
disadvantage is that it is expensive if the page table is large. Having to load the full page
table at every context switch hurts performance.
Alternatively, the page table can be entirely in main memory. In this case, only one
register is needed in the MMU to point to the start of the page table (PTBR - page
table base register). Only one register needs to be changed when context switching but
more memory references are needed to read page table entries during execution of an
instruction.
4.5.1.3 Paging pros and cons
Memory allocation is easier, no external fragmentation occurs and there is a clear separation between user and system view of memory usage. However, the operating system
must keep a page table per process, internal fragmentation occurs and there is an
additional overhead when context switching.
4.5.1.4 Multilevel page tables
To get around the problem of having to store huge page tables in memory all the time,
many computers use a multilevel page table. Each virtual address is split into the level
2 page table offset, the level 1 page table offset, and the page offset.
33
4 Memory management
34
Figure 4.3: A 32-bit address with two page table fields and a two-level page table.
4.5 Virtual Memory

4.5.1.5 Structure of a page table entry
The exact layout of an entry is highly machine dependant, but the kind of information
present is roughly the same from machine to machine. 32 bits is a common size for an
entry. The most important field is the page frame number. Next to it, we have the
Present/absent bit. If this bit is 1, the entry is valid and can be used. If it is 0, the
virtual page to which the entry belongs is not currently in memory. Accessing a page
table entry with this bit set to 0, causes a page fault.
Figure 4.4: A typical page table entry.

The Protection bit tells what kinds of access are permitted. In the simplest form, this
field contains 1 bit, with 0 for read/write and 1 for read only. A more sophisticated
arrangement is having 3 bits, one bit each for enabling reading, writing and executing
the page.
The Modified and Referenced bits keep track of the page usage. When a page is written
to, the Modified bit is set. This bit is also called the dirty bit. It is useful when writing
out page frames. If it has not been modified, it does not need writing back to the disk
because the disk copy is still valid.
The Referenced bit is set when a page is read from or written to. It helps the OS choose
a page to evict when a page fault occurs. Pages that are not being used are better
candidates than pages that are.
The last bit allows caching to be disabled for the page. This feature is important for
pages that map onto device registers rather than memory. If the OS is waiting for an
I/O device to respond to a command, it is essential that the hardware keep fetching the
sod from the device and not use an old cached copy.
4.5.2 TLBsTranslation Lookaside Buffers

Since only a small fraction of the page table entries are actually read, it is a good idea
to cache the frequently used entries.
35
4 Memory management
When a virtual address is presented to the MMU for translation, the hardware first checks
to see if its virtual page number is present in the TLB by comparing it to all the entries
simultaneously (i.e. in parallel). If a valid match is found and the access does not violate
the protection bits, the page frame is taken directly from the TLB, without going to the
page table. This is called a TLB hit.
Alternatively, if a TLB miss occurs, the MMU does an ordinary page table lookup .It
then evicts one of the entries from the TLB (if necessary) and replaces it with the page
table entry just looked up. The TLB access is retried and this time it will be successful.
Alternatively, software can handle a TLB miss in which case its up to the operating
system to walk the page tables to find the required entry. In either case, if no valid entry
is found in a page table, a page fault exception is raised which the operating system must
handle (by bringing the required data into memory).
4.5.3 Inverted page tables

In a 32-bit system, the address pace consists of 232 bytes, with 4096 bytes per page. This
means over 1 million page table entries are needed. This is doable but in a 64-bit system,
with an address space of 264 bytes and 4kB pages, the table would be over 30 million
gigabytes. This is, of course, not possible.
The solution is an inverted page table. In this design, there is one entry per page
frame in real memory, rather than one entry per page of virtual address space. An entry
keeps track of which virtual page is located in the page frame.
The downside to inverted page tables is that virtual-to-physical translation becomes much
harder. When process n references virtual page p, the hardware can no longer find the
physical page by using p as an index into the page table. Instead, it must search the
entire inverted page table for an entry (n, p). Furthermore, the search must be done on
every memory reference, not just on page faults. The way out of this is to use the TLB.
If the TLB can hold all of the most used pages, translation is just as fast as with normal
page tables. On a TLB miss, the inverted page table has to be searched. One feasible
way to accomplish this is to have a hash table hashed on the virtual address. All the
virtual pages currently in memory that have the same hash value are chained together.
36
4.5 Virtual Memory
4.5.4 Shared pages

Another advantage of paged memory is data sharing, for example shared libraries. This
is achieved by implementing two logical addresses which map to a single physical address.
If code is non-self modifying, it can easily be shared between users. Another technique
is to use copy-on-write. With this implementation, a page is marked as read-only for all
processes. If a process tries to write to the page, the MMU will trap to the OSs fault
handler. A new frame will be allocated, the data will be copied and a new page table
mapping set up.
When a process creates a copy of itself, the pages in memory that might be modified by
either of the processes are marked copy-on-write. When one process modifies the memory,
the kernel intercepts and copies the memory so that changes in one process memory are
not visible to the other.
4.5.5 Page replacement algorithms

When a page fault occurs, the OS has to choose a page to remove from memory to make
room for the page that has to be brought in. There are, as always, many algorithms that
can be used to choose a page to remove.
4.5.5.1 The optimal page replacement algorithm

This algorithm is the best, but impossible to implement. It involves replacing the page
which will not be used again for the longest period of time. It is, however, impossible
for the processor to know at the time which page will sit for the longest without being
referenced.
4.5.5.2 The not recently used page replacement algorithm

Page table entries, as we have seen, contain a Referenced bit and a Modified bit. When
implementing the NRU algorithm, the R bit is cleared periodically (the M bit cannot be
cleared because it is needed to know whether or not the page has to be written back to
disk). The pages are then divided into the following four categories.
1. Not referenced, not modified
37
4 Memory management
2. Not referenced, modified
3. Referenced, not modified
4. Referenced, modified
The algorithm removes a page at random from the lowest numbered non-empty category.
4.5.5.3 The first-in, first-out (FIFO) page replacement algorithm

This algorithm keeps a queue of pages, and discards from the head, always replacing the
page that has been in memory for the longest. This algorithm is rarely used because of
the chance of removing a frequently used frame.
4.5.5.4 The second chance page replacement algorithm

A modification to FIFO avoids the problem of replacing a heavily used page by inspecting
the R bit of the page at the head of the queue (the oldest page). If the bit is 0, it is old
and unused, so discard it immediately. If the bit is 1, however, the reference bit is reset
to zero and the page is moved to the tail of the queue as though it has just arrived, and
the next page (at the head) is considered for replacement. We are effectively giving the
page a second chance.
4.5.5.5 The clock page replacement algorithm

Although second chance is a reasonable algorithm, it is inefficient because it is always
moving pages around the list. A better approach is to keep the page frames on a circular
list in the form of a clock. A hand points to the oldest page. When a page fault occurs,
the page being pointed to is inspected. If the R bit is 1, it is cleared and the hand is
advanced one position.
4.5.5.6 The least recently used (LRU) page replacement algorithm

In this algorithm, throw out the page that has been unused for the longest amount of
time, assuming that the past is a good predictor of the future. This algorithm is not
cheap: we must somehow keep track of which frame is the least recently used.
38
4.5 Virtual Memory

The first option is to maintain a linked list of all pages in memory, with the most recently
used page at the front and the least recently used page at the rear. This list must be
updated on every memory reference. This is not an efficient way to implement LRU and
it can be done using special hardware instead.
The simplest method, using special hardware, is to have an n-bit counter, that is incremented after each instruction. After each memory reference, the current value of the
counter is stored in the page table entry for the page just referenced. When a page fault
occurs, the OS examines all the counters in the page table to find the lowest one. That
page is the least recently used.
Another hardware LRU algorithm, for a machine with n page frames, maintains a matrix
of n n bits, initially all at zero. Whenever page frame k is referenced, the hardware
first sets all the bits of row k to 1, then sets all the bits of column k to 0. At any instant,
the row whose binary value is lowest is the least recently used.
Simulating LRU in software Few machines have the hardware to support LRU so instead, a software solution called the Not Frequently Used (NFU) algorithm is used.
It requires a software counter associated with each page, initially zero. At each clock interrupt, the OS scans all the pages in memory. For each page, the Referenced bit, which
is 0 or 1, is added to the counter. The counters keep track of how often each page has
been referenced. When a page fault occurs, the frame with the lowest counter is chosen
for replacement.
The problem is that it never forgets the history. A modified algorithm, known as ageing,
maintains an 8-bit value for each page. After every tick, the bits are shifted to the right
and the reference bit is placed in the most significant position of the byte. Again, the
page with the lowest value is chosen to be replaced when necessary.
4.5.5.7 The working set page replacement algorithm

In the purest form of paging, processes are started up with none of their pages in memory.
Pages are brought in when needed. This is called demand paging.
Most programs exhibit a locality of reference, meaning that during any phase of execution, they reference only a small fraction of their pages. The set of pages that a process
is currently using is called its working set. If the memory is too small to hold the entire
working set, thrashing will occur.
39
4 Memory management
In a multiprogramming system, processes are frequently moved to disk to let other processes have a turn at the CPU. When a process is loaded back into memory, technically
nothing needs to be done, it will cause page faults until the working set is back in memory,
but allowing this to happen after every context switch it slow. Therefore, paging systems
often try to keep track of each process working set and make sure that it is in memory
before letting the process run. This is called prepaging.
The working set algorithm needs to keep track of which pages are in the working set.
The set is defined as the pages used in the k most recent memory references. When a
page fault occurs, a page not in the working set can be evicted. Keeping a list of what
is in the working set is expensive, so an approximation is used: instead of counting back
k memory references, the set is defined as the pages used during the past milliseconds
of execution time. The amount of CPU time a process has used since it started is called
its current virtual time.
The hardware is assumed to set the R and M bits, as previously discussed, and a periodic
clock interrupt is assumed to cause software to clear the R bit on every tick. On every
page fault, the table is scanned to look for a suitable page to evict. As each entry is
processed, the R bit is examined:
If it is 1, the current virtual time is written into the Time of last use field in the page
table, indicating that the page was in use at the time the fault occurred. Since the page
has been referenced during the current clock tick, it is in the working set and not a
candidate for removal.
If it is 0, the page has not been referenced during the current clock tick and may be a
candidate for removal. To see whether or not it should be removed, its age (the current
virtual time minus its Time of last use) is computed and compared to . If the age is
greater than , the page is no longer in the working set. The new page is loaded here.
If R is 0 but the age is less than or equal to , the page is still in the working set. The
page is temporarily spared, but the oldest page is noted. If the entire table is scanned
without finding a candidate to evict, all pages are in the working set. In this case, if one
ore more unreferenced (R = 0) bits were found, the oldest one is evicted. In the worst
case, all pages have been referenced during the current clock tick, so one is chosen at
random for removal, preferable a clean bit, if one exists.
40
4.5 Virtual Memory
4.5.6 Segmentation
Paging suffers from internal fragmentation, that is, space inside an oversized page is
wasted. Segments, on the other hand, can vary in length, even during execution. For
example, the length of a segment may increase every time something is added (pushed) to
the stack. A segment is a logical entity and when programming, the user will see memory
as a set of objects, with no particular order. A segment might contain an array, or a
stack, and will usually contain only a single type of data.
So, using segments instead of pages makes it easier to have expanding/shrinking data
structures. Imagine a set of procedures in a paged memory system. They will all be
packed together in memory. If a procedure is modified and recompiled, resulting in it
being larger, all of the procedures after it will have their memory addresses shifted along.
On the other hand, a segmented memory will hold each procedure in its own segment,
starting at address 0. This means that if one procedure grows, it doesnt affect any of
the others. This greatly simplifies the linking up of procedures.
Segmentation also makes it easier to share code between processes, for example, a shared
library for printing graphics.
A segment is referenced using a two-part address containing the segment number and the
offset into that segment.
4.5.7 Paging versus segmentation
Is it transparent to the programmer?

How many linear address spaces are there?
Can code and data structures have separate permissions?
Can data structures sizes change?
Is it easy to share data between processes?
Paging
Yes
1
No
No
No
Segmentation
No
Many
Yes
Yes
Yes
4.5.8 Implementation of segmentation

Since pages have a fixed size and segments can vary in size, the implementation differs
somewhat.
41
4 Memory management
A segment table is maintained for each process. This table will contain many fields,
including the following:
Segment number
Access rights
Base address
Segment size
The table is part of the process context and is changed on each process switch. In the
same way as a page table, a segment table can either be represented in registers, or held
in main memory with a single register pointing to it. A register is also needed to store
the length of the page table since this value will differ for each process.
As we learned earlier, dynamic partitioning, such as segmentation, suffers from external
fragmentation, which is solved using compaction. This occurs as segments come and go
from main memory, but what happens if we cannot fit all of the segments into memory
at once? This leads us onto the concept of paged segments.
4.5.8.1 Paged segments
A system of pages segments splits segments up into k pages and maintains a separate
page table for each segment. Unfortunately, this requires a lot of extra hardware and is
therefore not very practical.
4.5.8.2 Software segments
This is the method used my most modern operating systems. It simply uses paging, but
considers pages m to m + l to be a segment. Each process has a local descriptor
table which describes the programs segments, including its code, stack, data, etc. There
is also a global descriptor table which describes the operating systems segments.
42
5 I/O subsystem
One of the functions of an operating system is to manage the hardware devices. It should
issue commands, handle errors and prove an interface between the hardware and the rest
of the system.
5.1 I/O Devices

I/O devices can be divided into four categories.
1. Block devices
Information stored in fixed-size blocks
Can read/write to a specific block
Commands include READ, WRITE and SEEK
A hard disk is a block device
2. Character devices
Works with streams of data
Commands include GET and PUT
A printer is a character device
3. Network devices
In UNIX and Windows, the socket interface is used
Devices themselves may be block/character
4. Miscellaneous
For example, clocks and timers
43
5 I/O subsystem
5.2 Memory Mapped I/O

The operating system can communicate with an I/O device by writing to registers in
the devices controller. There are two methods the OS can use to write to these registers. Firstly, each control register can be assigned an I/O port number. The assembly
instructions IN and OUT can then be used to read and write data. Alternatively, the
control registers can be mapped into the memory space. Each control register gets a
unique memory address. This method is known as memory-mapped I/O.
5.3 Data transfer from a device driver point of view

From the point of view of a device driver, there are three methods that can be used to
transfer data.
5.3.1 Polled I/O

If data is read from an I/O device using polling, the CPU must periodically check the
status of the device to see whether it is ready to send more data. The operating system
sits in a tight loop until the device becomes ready again. Polled I/O is simple but the
CPU time is wasted while waiting for the I/O device to return.
5.3.2 Interrupt-driven I/O

Another method to communicate with an I/O device involves the processor reading/writing
data, and then continuing with another process while waiting for the I/O device to complete. When it is ready for more data, it sends an interrupt to the CPU.
5.3.3 Direct memory access (DMA)

The third method of communicating with I/O devices uses a DMA controller. Although
interrupt-driven I/O is better than polling, it still isnt perfect. The CPU sill spends a
lot of time dealing with the I/O device and the current process must be interrupted every
time another block or stream of data needs to be read from or written to the device.
44
5.4 Data transfer from a users point of view
Figure 5.1: A flowchart showing how data is read from an I/O device using the polled
I/O method.
A direct memory access controller is a mini-processor dedicated to I/O tasks. When the
CPU needs to transfer data, it issues a command to the DMA controller, which performs
the transfer and raises an interrupt once the whole thing has finished. This means that
the CPU is not interrupted mid-transfer.
The DMA controller contains registers that the CPU uses to issue commands to it. There
is a memory address register, a byte count register and control registers, which specify
the I/O port to use, whether the data is being read from the device or written to the
device, the transfer unit (byte at a time or word at a time) and the number of bytes to
transfer per burst.
5.4 Data transfer from a users point of view

The user is unaware of how the kernel is transferring data, whether it be by polling,
interrupt-driven I/O or DMA. From the point of view of a process or application, system
calls can either be blocking calls, non-blocking calls or asynchronous calls.
45
5 I/O subsystem
Figure 5.2: A flowchart showing how data is read from an I/O device using the interruptdriven I/O method.
5.4.1 Blocking
When a process makes a system call, the process is moved to a waiting for I/O queue.
When the call completes, the process is moved back to the Ready queue and is rescheduled.
The process will also be rescheduled if an error occurs. If an error does not occur, all
system calls will return all of the requested data. An example of an error is if the file was
not found, or the process does not have the required permission to access it.
5.4.2 Non-blocking
With a non-blocking call, the process is not rescheduled after making the call. Instead
of the process asking for, say, 100 bytes, it will request up to 100 bytes and instead of
the call returning all 100 bytes (or an error), it will return immediately with as much as
46
5.5 I/O Buffering
Figure 5.3: A flowchart showing how data is read from an I/O device using a DMA
controller.
possible. Often this will be zero bytes, which is a valid success. Non-blocking I/O may
be useful for user-interface code which checks whether a key has been pressed.
5.4.3 Asynchronous
Similar to non-blocking calls, asynchronous calls do not make the process wait. Instead,
a signal (a software interrupt) is sent to the process when the data from the system call
is ready. This allows the process to get all of the requested data and continue executing
while the call takes place.
5.5 I/O Buffering

I/O devices are often a lot slower than the CPU and the memory where the data is being
transferred. To deal with this difference is speed, buffering is used. Imagine that a use is
47
5 I/O subsystem
trying to print some characters. The printer is a lot slower than the memory so instead
of having the CPU sit and wait for the printer, the data can be copied to a buffer.
There are different kinds of buffers:
Single buffering - the OS assigns a buffer to the user request
Double buffering - one buffer is emptied while a second is filled up, then they swap
Circular buffering - useful for buffering data streams, old data is overwritten if
consumer cannot keep up
5.6 Caching
We have already looked at cache memory and have learned that if a program needs to
read a memory word, the hardware checks to see if the data is in the cache before going
to main memory. What we have not discussed is how the operating system decides what
to cache.
5.6.1 Temporal locality of reference

Using this principle, we assume that recently accessed data is likely to be accessed again
in the near future. If the program uses lots of loops, temporal locality may be expected
because the same instructions will be accessed many times.
5.6.2 Spatial locality of reference

Using this principle, we assume that if a memory address is accessed, it is very likely that
the program will then want the data next memory address. We expect spatial locality
with a non-looping sequential program.
48
6 File systems
The file system is the part of an operating system that deals with filesthe units that
information is stored on disk. The file system has two main parts: the directory service,
which maps file names to file identifiers and handles access and existence, and the storage
service, which provides the mechanism to store data on disk.
6.1 Files
A file is an abstract block of arbitrary information seen by a user. Files are used to
simplify the storage procedure so the user does not need to worry about how and where
the data is stored on disk.
6.1.1 File names

Files usually have two forms of identification. They have the name that users see, which
usually consists of two parts (the file name and the file extension), and a system file
identifier (SFID), a unique integer used to identify the file within the file system.
Before a process can perform any operations on a file, it must be opened. When a process
tries to open a file, the permissions are checked and the system returns an integer called
a file descriptor which the process can use to access the open file.
6.1.2 File meta-data

Along with the file name and the contents of the file, other attributes must be stored
with a file.
Location - a pointer to the file location on the disk
Size - the current size of the file
49
6 File systems
Type - if the system enforces file types
Protection - permissions regarding read/write/execute
Time and user ID - who created and modified the file
This information is known as the meta-data and is stored in the file control block.
6.1.3 File operations

System calls such as the following can be used to perform operations on files.
1. Create
2. Delete
3. Open
4. Close
5. Read
6. Write
7. Append
8. Seek
9. Get attributes
10. Set attributes
11. Rename
6.1.4 Hardlinks
A hardlink, also known as an alias, is a second name for a particular file. This allows
multiple files, with different names and in different directories, to point to the same data
on disk. The link does not have its own meta-data, it uses the same meta-data as the
original file. If the original file is moved, renamed or modified, the hardlink will continue
to refer to the same file. Hardlinks can only exist with one file system and cannot span
mount points, in the same way as a file cannot be created in one file system with its data
stored in another file system. It does not make sense to do this.
50
6.1 Files
Most operating systems do not allow hard links to directories, because this would create
the possibility of endless cycles.
When a file is deleted, how do we know if the data can be removed from the disk and the
space freed? Usually this isnt a problem because only one file will point to the data, and
if that file is deleted, then the data is no longer needed. However with hard links, there
may be other references to the same data, and although one file has been deleted, there
may be more files that still exist, linking to the same data. To overcome this problem, a
link counter can be used. This is incremented every time a hardlink is made to a file.
The OS can deallocate the space on disk only if the link counter is zero.
6.1.5 Symbolic (soft) links

A symbolic link is a small file, with its own meta-data, which references another file. A
symbolic link can span mount points. In Windows, a symbolic link is implemented as a
shortcut. The OS redirects all operations on a symbolic link to the original file. If that
file is moved, renamed or deleted, the link is broken and becomes a dangling link.
6.1.6 Password files
In UNIX, /etc/passwd holds a list of password entires, each of the form Username:Password:UID:GID:Use
ID info:Home directory:Shell.
A one-way function is used to encrypt passwords, that is, a function which is easy to
compute in one direction, but has a hard to compute inverse. This is a bit like a telephone
directory. Its very easy to convert a person to a phone-number, but doing a lookup in
the other direction is difficult.
To login, the following steps are performed.
1. Get username
2. Get password
3. Encrypt password
4. Check encrypted password against the version in /etc/passwd
5. If they match, instantiate login shell
51
6 File systems
The passwd file contains useful information, for example, the User ID info field can
contain data such as the users full name. For this reason, everyone is granted read
permission. Unfortunately, this opens users up to attack. It allows other users of the
system to read their encrypted passwords and decrypt them. To solve this, an x is
placed in the password field in the passwd file, indicating that the encrypted password is
actually stored in /etc/shadow : a file readable by the superuser only.
6.1.7 Special files

In UNIX, character special files and block special files can be used to communicate with
hardware devices. By writing to a particular special file, data can be read from the
keyboard, for example.
6.2 Directories
A file system could simply store every file in the same place, with no structure. This would
be simple to implement but not good for organisation or easily setting permissions. A
directory structure (with more than one directory) is a more sensible approach.
One improvement from having a single directory is to implement a two-level directory,
where each user gets their own directory. From a single users point of view, however,
this approach is just as bad as the first.
A hierarchical directory system is the most flexible. Directories can contain more directories! Files can be referred to using either an absolute path name which starts at the
root directory (e.g. /usr/Jake/pictures/dog.jpg) or a relative path name which gives
a file path relative to the current working directory.
UNIX has a distinguished root directory called /.
Each directory has two special entries: . and .. (pronounced dot and dotdot).
Dot refers to the directory itself while dotdot refers to its parent directory. These can
be used to access files higher up the hierarchy when using relative paths. For example,
../file.doc refers to the file with file name file.doc in the directory above the current
working directory.
Directories can be stored as files on disk, each with their own SFID.
52
6.3 File system implementation

There are two approaches to file system implementation, used by UNIX and Windows
respectively. Let us first look at how UNIX implements directories.
6.3.1 Unix file system

A data structure called an i-node (index-node) is associated with each file. It contains
the file meta-data and the physical location of the files content on disk. The location of
the data consists of the addresses of all of the files blocks (the data may not be stored
contiguously). The i-node only needs to be in memory when the file is open.
I-nodes have a fixed size but they must store the addresses of all of the files blocks on
disk. What happens if the file has too many blocks and the i-node cannot store them all?
Instead of linking directly to a block, one (or indeed more) of the i-nodes addresses link
to other blocks of addresses. This is illustrated in figure 6.1.
Now that we have an i-node, with all of the files information, directory entries only need
to contain a file name and an i-node number. A directory is a file listing these entries.
6.3.1.1 Opening a file

Recall that when a file is opened by a process, the file system returns an integer called
a file descriptor. The system maintains a per-process list of open files, called a file
descriptor table. The table is indexed by the file descriptors. Each entry in the file
descriptor table contains dynamic information about an open file, such as opening mode
(read-only, etc.). This stuff has to be kept separate from the i-node to allow multiple
processes to open the same file.
The file descriptor table does not point directly to the i-node, but instead points to an
entry in a system-wide open file table. Each entry in the open file table contains a
pointer to the files i-node and the file position (the byte from which the next read/write
will start). This technique allows child processes to share the file position for an inherited
file with their parent. See figure 6.2 for an illustration of the relationship between the
two tables and the i-nodes.
53
6 File systems
Figure 6.1: The structure of an i-node in UNIX with addresses of 12 blocks with further
indirect addresses.
54

6.3.1.2 Access Control
Access control information is held in each files i-node. Each i-node contains the owners
UID (User ID) and GID (Group ID) (users can be organised into groups by the
system administrator). The i-node also stores a set of permissions describing what the
owner is allowed to do, what the group is allowed to do and what everybody else is allowed
to do, known as world.
r
Owner
w x
Group
w x
World
w x
Three bits are used for each of owner, group and world. The three bits correspond to
read, write and execute permissions respectively.
Directory permissions When the read permission is set for a directory, permission to list
the files in the directory is granted. When the write permission is set for a directory, users
have the ability to modify entries in the directory. This includes creating, deleting and
renaming files. Finally, when the execute permission is set for a directory, permission is
granted to access file contents and meta-data, but not to list the files inside the directory
(unless read is also set).
6.3.1.3 Disk layout
Figure 6.3: The layout of blocks on a disk using the UNIX file system.
A UNIX disk is made up of a boot block, used to start the computer, followed by a number
of partitions, also made up of blocks. Block 1, in any partition, is the superblock, which
55
6 File systems
contains the number of i-nodes, the number of blocks on the disk and the start of the list
of free blocks. After the superblock is the set of i-nodes and then the data blocks.
6.3.1.4 Pipeline
In UNIX, a pipeline is a set of processes where the output of one process is used as the
input to the next one. Each pair of processes is connected by a pipe, which is a first-in
first-out (FIFO) data structure. If process A wants to send data to process B, it writes
the data to the pipe as though it were a file. Process B can then read the pipe, also as
though it were a file.
Pipes are created using the pipe system call, which creates the pipe and returns a pair
of file descriptors referring to the read and write ends of the pipe.
If a pipe becomes full, the UNIX operating system will buffer the first process output.
If the buffer fills up, then the first process is blocked until the second process is ready to
receive again.
A parent process can open an anonymous pipe and have a child process inherit the other
end of the pipe, or it can create several new processes and form a pipeline. A named
pipe is used for inter-process communication. A named pipe exists beyond the life of the
process and must be deleted one it is no longer being used. A named pipe also appears
as a file to a process.
6.3.2 Windows file system

Windows does not have any concept of an i-node, but stores all of the files attributes
in the directory entry itself. Older versions of windows used the FAT16 and FAT32 file
systems, which give an alternate way to reference the blocks making up a file.
6.3.2.1 File allocation table (FAT)
When using a file allocation table, the directory entry for a file will, instead of listing all
of the blocks of a file, contain an offset into the FAT.
A file allocation table lists the clusters on a disk (clusters are a collection of sectors on a
disk and are the smallest logical unit that can be allocated to files) and uses a linked list
to store the locations of each part of a file. Each entry in the FAT contains either the
56

index of another entry in the FAT, a value of EOF (end of file) or FREE. The contents
of a file can be found by starting at the offset specified in the directory entry, and then
following the subsequent offsets found in the FAT.
Figure 6.4: A file allocation table storing files as linked lists of disk clusters.
Figure 6.4 shows how the directory entry for a file gives an offset into the FAT. In the
case of File A, the first block of the file is in cluster 2, and by following the pointers, we
can see that the rest of the file is in cluster 3 and then cluster 5, where the file ends.
FAT16 could only handle 2GB partitions and used 32KB clusters (which is rather large,
and large cluster sizes leads to internal fragmentation when the files stored in them do
not use up all of the space).
FAT32 was an improvement and could handle 8GB partitions with 4KB clusters. Further
enhancements include the ability to place the root directory anywhere in a partition (in
FAT16, the root directory had to immediately follow the FATs) and the ability to use a
backup copy of the FAT instead of the default. Of course, space is used more efficiently
than in FAT16 because of the smaller cluster size.
6.3.2.2 NTFS
Newer versions of Windows no longer use FAT file systems, but instead use NTFS (NT
file system).
The fundamental structure in NTFS is a volume. A volume may be a portion of a disk,
a full disk, or a collection of disks. Each volume is a linear sequence of fixed-size clusters.
Clusters are referred to using a 64-bit number which gives the offset from the start of the
volume.
File names in NTFS are limited to 255 characters while full file paths are limited to
57
6 File systems
32767 characters. Furthermore, file names use Unicode, allowing non-Latin characters to
be used.
All of the files and directories on a volume are stored in the Master File Table (MFT)
which is made up of 1KB records. Each record describes one file or directory by, listing
the files attributes (including the file name), and the disk addresses where its blocks are
located. If the file is too large, a second record can be used to continue listing the blocks.
The MFT is a file itself and can be stored anywhere in a volume. The OS finds the file
by looking in the boot clock of the disk, where the location of the MFT is set when the
OS is installed. The MFT can also grow, as more records are required.
To aid recovery after a system crash, all file system data structure updates are performed
inside transactions. Before a data structure is altered, the transaction writes a log record
that contains redo and undo information. After the transaction, the log is updated to say
that it succeeded. If the system crashes, the log can be processed to restore the system
to a consistent state.
Other features of NTFS include automatic compression when files are written to disk and
file encryption.
6.3.3 Mounting file systems
File systems need to be mounted in order to be used. One file system will be treated
by the OS as the root file system. Subsequent file systems are mounted on an existing
directory in an already mounted file system (usually the root). The directory is called a
bf mount point. Figure 6.5 shows a file system mounted on the /usr directory of the root
file tree. The previous contents of the /usr directory become invisible and it now refers
to the root directory of the newly mounted file system.
58
Figure 6.5: A file system mounted on the file tree of the root file system.
Older versions of Windows do not mount all file systems in a single tree but instead
maintain a forest of file trees. That is, you get A: and C: directories, which have no
parents. In newer versions of Windows, My Computer is the parent of each file system.
59
7 Protection
With confidential information being stored on computers, protection and security is a
vital part of operating system design.
Some techniques for protecting the system have already been discussed. These include
having two modes of operating: user mode and kernel mode; memory management hardware which keeps processes address spaces separate and access control on files.
7.1 Authentication
It is important for modern operating systems to authenticate users, to avoid unauthorised
access and to make sure the correct services, files, etc. are presented to the user. Using a
password, along with some sort of username, is the most common form of authentication.
When designing a login field, it is best to obscure the characters of the password as they
are typed in. An even better method is to hide the characters altogether, but the user
often wants visual feedback regarding how many characters theyve typed in so far.
There are a few common mistakes made when authenticating a username and password.
Firstly, it is important that any incorrect username and password combination takes the
same amount of time to return with the error message. For example, if a password of 20
characters is typed in, the system may compare each character in turn with the version
of the password on file. If the first character is wrong, it may return immediately with an
error message, however if the 20th character is wrong, it will have taken a small amount
of time to check the previous 19 characters before returning the error. By measuring this
amount of time, crackers can deduce how many characters in the password are correct.
Secondly, error messages such as invalid username or invalid password can greatly
help a cracker compared to a single error message that says invalid username or password. This kind of implementation tells the cracker whether they have a correct username or not, which is much more useful to them than just being told that the username
and password combination is wrong.
61
7 Protection
How do we store the passwords in the computer. The simplest method is to simply keep
a file listing all of the username and password pairs. It would be obvious to only make
this file accessible to the login program. This isnt totally secure, though. If somebody
did manage to get a hold of the file, everybodys passwords would be exposed. A better
method is to scramble the passwords before storing them by using a one-way function.
7.1.1 One-way functions

A one-way function is a function that is easy to compute, but hard to inverse. If the
one-way function f is given the input x, it should be able to easily compute f (x), but
given only f (x), it should be very difficult to invert the process and deduce x.
One-way functions are useful for storing passwords. Instead of storing plain-text passwords, they are first put through a one-way function. When a user enters their password
to log in, the system puts it through the same one-way function and compares the result to the one saved in the password file. If they match, the user is allowed access.
This method means that if somebody gets a hold of the password file, they only have
the scrambled versions of the passwords, and since they were created with a one-way
function, the cracker will not be able to invert them and find out the passwords.
In reality, a cryptographic hash function such as SHA-1 is used. They have the same
principle, but return a fixed-size string known as the hash value. Given a hash value, a
hacker cannot know the password or even the length of the password.
Unfortunately, hashing passwords is not perfect. It often is possible to reverse them using
rainbow tables. A rainbow table is a precomputed table of common passwords and their
corresponding hash values. This allows lookups to be done on hash values, therefore it
is important to have secure passwords. Hashing the passwords does not prevent against
brute force attacks either, where the attacker tries every possible combination of values.
Furthermore, since the hash values are of fixed length, it is possible to find two different
strings that have the same hash value.
7.1.2 Salts
Adding a salt to a password before hashing it can help to make rainbow tables ineffective.
Lets say that the system adds the word salt onto each password before generating the
hash and Alice uses pa55word as her password. When Alice enters her password, the
62
7.2 Trojan Horses

system will append the salt, creating the string pa55wordsalt, which is then hashed.
This particular string will generate the following hash value using the MD5 algorithm:
810f98ef4d1ee70a674f70619b77e78d
If a hacker gets access to this hash value, they may be able to reverse it to reveal
pa55wordsalt. But, if they then try this as the password for Alices user account, the system will again append the salt causing it to generate the hash value of pa55wordsaltsalt,
which would not match the hash value shown above (which is the one stored in the password file).
Having a random per-user salt that changes every time the password is changed can make
this technique even more effective because the cracker cannot know in advance what the
salt will be and therefore cannot pre-compute appropriate rainbow tables.
7.2 Trojan Horses

A trojan horse is an innocent and useful looking program which, when executed, inherits
all of the users permissions and then runs code with undesirable consequences. This could
include deleting files, or sending particular files to the cracker.
7.3 Viruses
A computer virus is a program that can replicate itself and spread through computers by
attaching itself to an existing program. A virus will make unauthorised changes to the
computer, whether these changes are done with malicious intentions or not. Either way,
a virus is an unwanted piece of software.
7.3.1 Memory resident viruses

Instead of living on the hard disk and only executing when the user unknowingly opens
it, a memory resident virus can live in memory, along with interrupt vectors, and catch
particular system calls. For example, the exec is a system call used to run a new process.
If the virus replaces the systems interrupt vector for this system call, it can run, in kernel
mode, each time the call is made, with an executable file passed to it as an argument.
63
7 Protection
7.3.2 Boot sector viruses

The boot sector is the first block on a hard disk or floppy disk, which contains code
automatically loaded into memory by a computers BIOS. Because the code is automatically executed when the computer is turned on, this is the perfect place to hide a virus.
Furthermore, the machine will be in kernel mode and the operating system will have not
booted yet. This gives the virus full control over the system.
7.4 Worms
A worm is a program that replicates itself and spreads to other computers, but unlike
a virus, does not need to attach itself to an existing program. A worm is a standalone
program, which causes harm to computers and networks.
64

Operating Systems

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Operating Systems

Transféré par

Droits d'auteur :

Formats disponibles

Operating Systems

Page replacement algorithms . . . . . . . . . . . . . . . . . . . . .

6.1.6 Password files . . . . . . . . . . . . .

1 Introduction to operating systems

1.1 Abstract view of an operating system

1 Introduction to operating systems

1.1.3 Monolithic operating systems

1.1.4 Dual-mode operation

1.2 Bits and bytes

1.2 Bits and bytes

1 Introduction to operating systems

1000 in decimal is therefore equal to 3E8 in hexadecimal (1410 = E16 ).

1.5 Protecting I/O, memory and the CPU

1.5 Protecting I/O, memory and the CPU

1 Introduction to operating systems

1.5.2 Protecting the CPU

1.6 Kernels and microkernels

2.1 Process creation

2.1.1 The fork system call

2.2 Process termination

2.3 Process hierarchies

2.3 Process hierarchies

2.4 Processes states

2.5 Implementation of processes

2.5.2 Interrupt vectors

2.7 The boot process

2.7.1 Booting UNIX

2.7.2 Booting Windows

3.1 Context switch

3.1.1 Heavyweight and lightweight processes

3.2 Process behaviour

3.3 When to schedule

3.4 Pre-emptive and non-pre-emptive schedulers

3.5 Idle system

3.5 Idle system

3.6 Scheduling algorithm goals

3.7.1 First-come first-served

3.7.2 Shortest job first

3.7.3 Shortest remaining time first

3.8 Static and priority scheduling algorithms

3.8.1 Static priority scheduling

3.8.2 Dynamic priority scheduling

3.8 Static and priority scheduling algorithms

4.1 Basic memory management

4.1.1 Monoprogramming without swapping or paging

4.1.2 Multiprogramming with fixed partitions

4.2 The address binding problem

4.2.1 Compile time

4.2.2 Load time

4.3 Mapping logical to physical addresses at run-time

4.3 Mapping logical to physical addresses at run-time

4.4 Dynamic Partitioning

4.5 Virtual Memory

4.5 Virtual Memory

4.5.1.1 Page tables

4.5 Virtual Memory

4.5.1.3 Paging pros and cons

4.5.1.4 Multilevel page tables

4.5 Virtual Memory

Figure 4.4: A typical page table entry.

4.5.2 TLBsTranslation Lookaside Buffers

4.5.3 Inverted page tables

4.5 Virtual Memory

4.5.4 Shared pages