Académique Documents
Professionnel Documents
Culture Documents
Jake Wright
November 7, 2013
Contents
1 Introduction to operating systems
1.1 Abstract view of an operating system . . . . . .
1.1.1 Multiprogramming . . . . . . . . . . . . .
1.1.2 Time-sharing . . . . . . . . . . . . . . . .
1.1.3 Monolithic operating systems . . . . . . .
1.1.4 Dual-mode operation . . . . . . . . . . . .
1.1.4.1 Real mode and protected mode .
1.2 Bits and bytes . . . . . . . . . . . . . . . . . . .
1.2.1 Words . . . . . . . . . . . . . . . . . . . .
1.3 Endianness . . . . . . . . . . . . . . . . . . . . .
1.4 Memory . . . . . . . . . . . . . . . . . . . . . . .
1.5 Protecting I/O, memory and the CPU . . . . . .
1.5.1 Protecting memory . . . . . . . . . . . . .
1.5.2 Protecting the CPU . . . . . . . . . . . .
1.6 Kernels and microkernels . . . . . . . . . . . . .
2 Processes
2.1 Process creation . . . . . .
2.1.1 The fork system call
2.2 Process termination . . . .
2.3 Process hierarchies . . . . .
2.4 Processes states . . . . . . .
2.5 Implementation of processes
2.5.1 Process control block
2.5.2 Interrupt vectors . .
2.6 Threads . . . . . . . . . . .
2.7 The boot process . . . . . .
2.7.1 Booting UNIX . . .
2.7.2 Booting Windows .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
8
8
9
9
10
10
10
11
11
12
12
.
.
.
.
.
.
.
.
.
.
.
.
13
13
14
14
15
15
15
15
16
17
17
17
18
Contents
3 Scheduling
3.1 Context switch . . . . . . . . . . . . . . . . .
3.1.1 Heavyweight and lightweight processes
3.2 Process behaviour . . . . . . . . . . . . . . .
3.3 When to schedule . . . . . . . . . . . . . . . .
3.4 Pre-emptive and non-pre-emptive schedulers .
3.5 Idle system . . . . . . . . . . . . . . . . . . .
3.6 Scheduling algorithm goals . . . . . . . . . .
3.7 Algorithms . . . . . . . . . . . . . . . . . . .
3.7.1 First-come first-served . . . . . . . . .
3.7.2 Shortest job first . . . . . . . . . . . .
3.7.3 Shortest remaining time first . . . . .
3.7.4 Round-robin . . . . . . . . . . . . . .
3.8 Static and priority scheduling algorithms . . .
3.8.1 Static priority scheduling . . . . . . .
3.8.2 Dynamic priority scheduling . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Memory management
4.1 Basic memory management . . . . . . . . . . . . . . .
4.1.1 Monoprogramming without swapping or paging
4.1.2 Multiprogramming with fixed partitions . . . .
4.2 The address binding problem . . . . . . . . . . . . . .
4.2.1 Compile time . . . . . . . . . . . . . . . . . . .
4.2.2 Load time . . . . . . . . . . . . . . . . . . . . .
4.2.3 Run-time . . . . . . . . . . . . . . . . . . . . .
4.3 Mapping logical to physical addresses at run-time . . .
4.4 Dynamic Partitioning . . . . . . . . . . . . . . . . . .
4.4.1 Compaction . . . . . . . . . . . . . . . . . . . .
4.5 Virtual Memory . . . . . . . . . . . . . . . . . . . . .
4.5.1 Paging . . . . . . . . . . . . . . . . . . . . . . .
4.5.1.1 Page tables . . . . . . . . . . . . . . .
4.5.1.2 Page table structure . . . . . . . . . .
4.5.1.3 Paging pros and cons . . . . . . . . .
4.5.1.4 Multilevel page tables . . . . . . . . .
4.5.1.5 Structure of a page table entry . . . .
4.5.2 TLBsTranslation Lookaside Buffers . . . . . .
4.5.3 Inverted page tables . . . . . . . . . . . . . . .
4.5.4 Shared pages . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
19
19
20
20
21
21
22
22
23
23
24
24
24
24
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
28
28
28
28
28
29
29
29
31
31
31
32
33
33
33
35
35
36
36
Contents
4.5.5
4.5.6
4.5.7
4.5.8
5 I/O
5.1
5.2
5.3
subsystem
I/O Devices . . . . . . . . . . . . . . . . . . . .
Memory Mapped I/O . . . . . . . . . . . . . .
Data transfer from a device driver point of view
5.3.1 Polled I/O . . . . . . . . . . . . . . . .
5.3.2 Interrupt-driven I/O . . . . . . . . . . .
5.3.3 Direct memory access (DMA) . . . . . .
5.4 Data transfer from a users point of view . . . .
5.4.1 Blocking . . . . . . . . . . . . . . . . . .
5.4.2 Non-blocking . . . . . . . . . . . . . . .
5.4.3 Asynchronous . . . . . . . . . . . . . . .
5.5 I/O Buffering . . . . . . . . . . . . . . . . . . .
5.6 Caching . . . . . . . . . . . . . . . . . . . . . .
5.6.1 Temporal locality of reference . . . . . .
5.6.2 Spatial locality of reference . . . . . . .
6 File systems
6.1 Files . . . . . . . . . . . . .
6.1.1 File names . . . . .
6.1.2 File meta-data . . .
6.1.3 File operations . . .
6.1.4 Hardlinks . . . . . .
6.1.5 Symbolic (soft) links
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
37
38
38
38
38
39
39
40
41
41
42
42
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
44
44
44
44
44
45
46
46
47
47
48
48
48
.
.
.
.
.
.
49
49
49
49
50
50
51
Contents
6.2
6.3
7 Protection
7.1 Authentication . . . . . . . . .
7.1.1 One-way functions . . .
7.1.2 Salts . . . . . . . . . . .
7.2 Trojan Horses . . . . . . . . . .
7.3 Viruses . . . . . . . . . . . . .
7.3.1 Memory resident viruses
7.3.2 Boot sector viruses . . .
7.4 Worms . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
52
52
53
53
53
55
55
55
56
56
56
57
58
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
62
62
63
63
63
64
64
Figure 1.1: An abstract view of the operating system sitting in-between the hardware
layer and the user program layer.
1.1.1 Multiprogramming
Originally, when the current job paused to wait for an I/O operation to complete, the
CPU would sit and wait for it. With I/O bound jobs, this can waste a lot of time.
The solution was multiprogramming which involves partitioning memory into several
pieces, with a different job in each partition. While one job was waiting for I/O to
complete, another job could be using the CPU.
1.1.2 Time-sharing
Third generation operating systems were well suited for big scientific calculations, but
programmers wanted to be able to have the machine to themselves for a few hours, so
they could debug their programs quickly. This desire for quick response times paved
the way for time-sharing, a variation of multiprogramming, in which each user has an
online terminal.
1.2.1 Words
A word refers to the size of the processors registers. That is, a word on a 32 bit processor
is 4 bytes (32 bits) but on a 64 bit processor, a word will be 64 bits long. A word is
therefore the the number of bits that the processor can handle as a single unit.
1.3 Endianness
Endianness refers to the order of bytes in a word. A big-endian system stores the most
significant byte first while a little-endian system stores the least significant byte first.
Consider the decimal number 1000. It is common to represent numbers in hexadecimal
(base 16) because each hexadecimal digit represents four bits (so 2 hex digits represent
one byte), so lets first convert to that.
n
1000
62
3
n DIV 16
62
3
0
n MOD 16
8
14 (E)
3
1.4 Memory
Because faster memory is more expensive, only a small amount of it is used. We then
encounter a hierarchy of memory, getting slower but larger in size as we move away from
the CPU.
10
11
12
2 Processes
A process is a program in execution. Each program has its own address space: a list of
memory locations from some minimum (usually 0) to some maximum, which the process
can read and write to. The address space contains the executable program, the programs
data, and its stack. Also associated with each process is a set of registers, including the
program counter, stack pointer, and other hardware registers. Conceptually, each process
has its own virtual CPU. In reality, the CPU switches back and forth from process to
process (this rapid switching back and forth is multiprogramming).
13
2 Processes
The last situation in which processes are created applies to batch systems. Here users
can submit batch jobs to the system. When the operating system decides that it has the
resources to run another job, it creates a new process and runs the next job from the
queue.
14
15
2 Processes
Figure 2.1: A process can be in running, blocked, or ready state. Processes entering the
system are in the new and exit states.
state, its program counter, stack pointer, memory allocation, the status of its open files,
its accounting and scheduling information, and everything else about the process that
must be saved when the process is switched from running to ready or blocked state so
that it can be restarted later as if it had never been stopped.
16
2.6 Threads
6. Assembly language procedure starts up new current process
2.6 Threads
So far, we have assumed that each process executes separately from the others and each
process has its own address space. However, sometimes it is useful to split processes up
and having each part running in parallel, but in the same address space. To implement
this idea, we introduce the concept of threads.
In a system using the tread model, a process can be thought of as a group of resources.
A process still has its own virtual address space, but now contains threads. Each thread
has its own scheduling state (ready, running, blocked, etc.), registers and stack.
17
2 Processes
process is executed. The init process reads the file /etc/inittab to find out whether the
system should enter single user mode or multi-user mode and then starts background
processes.
In multi-user mode, init forks a getty process for each terminal listed in /etc/ttys. Getty
prints the login prompts on screen. When somebody supplies a username, /bin/login is
executed which asks for the password and authenticates the user. If authentication is
successful, login forks the users shell.
18
3 Scheduling
The part of the operating system that decides which process to run next is called the
scheduler. The scheduler has an important decision to make each time a new process
it to be run. The chosen process has a large effect on the perceived performance of the
system and, in addition to to picking the right process to run, the scheduler also has to
worry about making efficient use of the CPU because process switching is expensive.
19
3 Scheduling
be described as either:
I/O-bound - spends more time doing I/O than computation; has many short CPU
bursts.
CPU-bound - spends more time doing computations; has a few very long CPU
bursts.
20
21
3 Scheduling
Enforcing the systems policies is also important. If the local policy is that safety
control processes get to run whenever they want to, the scheduler has to make sure this
policy is enforced.
Another goal is keeping all parts of the system busy. If the CPU and all I/O devices
can be kept running all the time, more work gets done per second.
Throughput is the number of jobs per hour that the system completes. It is desirable
to maximise this.
Turnaround time is the average time taken to complete a job. It measures how long the
average user has to wait for the output. It is desirable to minimise this. In an interactive
system, this is known as response time. That is, the time between issuing a command
and getting the result.
Finally, it is necessary to minimise the waiting time, i.e. the amount of time a process
has been waiting in the ready queue.
Sensible scheduling strategies might be to:
Maximise throughput or CPU utilisation
Minimise average turnaround time, waiting time and response time
3.7 Algorithms
There are four main scheduling algorithms that we will discuss. Note that a burst time is
the time between scheduling and a process giving up its time slice. When a process gives
up its time slice, it is known as yielding. That is, it is essentially saying I have nothing
useful to do, please schedule something else.
22
3.7 Algorithms
23
3 Scheduling
3.7.4 Round-robin
Round-robin is another pre-emptive algorithm. Each process is given a time-slice (called
a quantum), after which the process is put on the back of the queue and the next process
is scheduled from the front of the queue. If the quantum size is too large, the algorithm
tends towards simulating FCFS. If it is too small, the context switch overhead is too high
to make the algorithm efficient. A quantum size is typically 10ms.
Round-robin is far - every process gets 1/nth of the CPU.
No process is kept waiting for longer than any other process.
Typically has a higher average turnaround time that SRTF, but has a better average
response time.
24
25
4 Memory management
Many processes are held in memory simultaneously, and every process has, in memory:
A code segment (or a text segment) contains the programs instructions
A data segment read-write contains global and static variables
A stack parameters, automatic and temporary variables
A heap dynamically allocated variables
The operating system also needs memory for itself. It is up to the memory management
subsystem to handle:
1. Relocation
2. Allocation
3. Protection
4. Sharing
5. Logical organisation
6. Physical organisation
27
4 Memory management
28
4.2.3 Run-time
Hardware can be used to automatically translate between program addresses and real
addresses at run-time. No changes are required to the program code. Providing the
necessary hardware is available, this is the most flexible scheme.
29
4 Memory management
Best fit - search entire list to find the best fitting hole, i.e. the smallest hole which
is large enough
worst fit - counterintuitively allocate the largest hole (again, searching the whole
list) to avoid filling memory with tiny, useless holes
Allocating more memory than necessary would be done if a process data segment is
expected to grow. When a process is swapped in or out of memory, it is wasteful to
also swap extra memory so only the memory actually in use should be swapped. If
processes can have two growing segments, for example, the data segment (used as a heap
for variables that are dynamically allocated and released) and a stack segment (for the
normal local variables and return addresses) an alternate arrangement, as shown in figure
4.1, is convenient. We can have the program, and then the data segment, and then the
stack segment, with extra space allocated in between the data and the stack segments,
which can be used by either one. The data segment can grow up into the space while the
stack segment can grow down.
When a process terminates, its memory returns to the free list, coalescing holes together
where appropriate.
Figure 4.1: The structure of a process address space with extra space allocated to allow
the data and stack segments to expand.
External fragmentation can occur with dynamic partitioning. This is when the total
available memory is sufficient but it is unusable because it is split into many holes. The
solution to this is compaction.
30
4.4.1 Compaction
When swapping creates multiple holes in memory, it is possible to combine them all into
one big one by moving all the processes downward as far as possible. This technique is
known as memory compaction. It is usually not done because it requires a lot of CPU
time.It also requires run-time relocation.
4.5.1 Paging
Addresses can be generated by a program using indexing, base registers, segment registers,
and other ways. These program-generated addresses are called virtual addresses and
form the virtual address space. On computers without virtual memory, the virtual
address is put directly onto the memory us and causes the physical memory word with the
same address to be read or written. When virtual memory is used, the virtual addresses
do not go directly to the memory bus. Instead, they go to the Memory Management
Unit (MMU) that maps the virtual addresses onto the physical memory addresses.
The virtual address space is divided up into units called pages. The corresponding units
in the physical memory are called page frames. The pages and page frames are always
the same size (typically 4kB or 8kB. In hardware, a Present/absent bit keeps track
of which pass are physically present in memory. If a program tries to use an unmapped
page, the MMU causes the CPU to trap to the operating system. This trap is called a
page fault. The OS picks a little-used page frame and writes its contents back to the
disk. It then fetches the page just referenced into the page fame just freed, changes the
map, and restarts the trapped instruction.
31
4 Memory management
Figure 4.2: The internal operating of the MMU with 16 4KB pages.
32
The simplest design is to have a single page table consisting of an array of fast hardware
registers with one entry for each virtual page. When a process is started, the OS loads
the registers with the process page table, taken from a copy kept in main memory.
This method is straightforward and requires no memory references during mapping. A
disadvantage is that it is expensive if the page table is large. Having to load the full page
table at every context switch hurts performance.
Alternatively, the page table can be entirely in main memory. In this case, only one
register is needed in the MMU to point to the start of the page table (PTBR - page
table base register). Only one register needs to be changed when context switching but
more memory references are needed to read page table entries during execution of an
instruction.
Memory allocation is easier, no external fragmentation occurs and there is a clear separation between user and system view of memory usage. However, the operating system
must keep a page table per process, internal fragmentation occurs and there is an
additional overhead when context switching.
To get around the problem of having to store huge page tables in memory all the time,
many computers use a multilevel page table. Each virtual address is split into the level
2 page table offset, the level 1 page table offset, and the page offset.
33
4 Memory management
34
Figure 4.3: A 32-bit address with two page table fields and a two-level page table.
35
4 Memory management
When a virtual address is presented to the MMU for translation, the hardware first checks
to see if its virtual page number is present in the TLB by comparing it to all the entries
simultaneously (i.e. in parallel). If a valid match is found and the access does not violate
the protection bits, the page frame is taken directly from the TLB, without going to the
page table. This is called a TLB hit.
Alternatively, if a TLB miss occurs, the MMU does an ordinary page table lookup .It
then evicts one of the entries from the TLB (if necessary) and replaces it with the page
table entry just looked up. The TLB access is retried and this time it will be successful.
Alternatively, software can handle a TLB miss in which case its up to the operating
system to walk the page tables to find the required entry. In either case, if no valid entry
is found in a page table, a page fault exception is raised which the operating system must
handle (by bringing the required data into memory).
36
37
4 Memory management
2. Not referenced, modified
3. Referenced, not modified
4. Referenced, modified
The algorithm removes a page at random from the lowest numbered non-empty category.
38
Simulating LRU in software Few machines have the hardware to support LRU so instead, a software solution called the Not Frequently Used (NFU) algorithm is used.
It requires a software counter associated with each page, initially zero. At each clock interrupt, the OS scans all the pages in memory. For each page, the Referenced bit, which
is 0 or 1, is added to the counter. The counters keep track of how often each page has
been referenced. When a page fault occurs, the frame with the lowest counter is chosen
for replacement.
The problem is that it never forgets the history. A modified algorithm, known as ageing,
maintains an 8-bit value for each page. After every tick, the bits are shifted to the right
and the reference bit is placed in the most significant position of the byte. Again, the
page with the lowest value is chosen to be replaced when necessary.
39
4 Memory management
In a multiprogramming system, processes are frequently moved to disk to let other processes have a turn at the CPU. When a process is loaded back into memory, technically
nothing needs to be done, it will cause page faults until the working set is back in memory,
but allowing this to happen after every context switch it slow. Therefore, paging systems
often try to keep track of each process working set and make sure that it is in memory
before letting the process run. This is called prepaging.
The working set algorithm needs to keep track of which pages are in the working set.
The set is defined as the pages used in the k most recent memory references. When a
page fault occurs, a page not in the working set can be evicted. Keeping a list of what
is in the working set is expensive, so an approximation is used: instead of counting back
k memory references, the set is defined as the pages used during the past milliseconds
of execution time. The amount of CPU time a process has used since it started is called
its current virtual time.
The hardware is assumed to set the R and M bits, as previously discussed, and a periodic
clock interrupt is assumed to cause software to clear the R bit on every tick. On every
page fault, the table is scanned to look for a suitable page to evict. As each entry is
processed, the R bit is examined:
If it is 1, the current virtual time is written into the Time of last use field in the page
table, indicating that the page was in use at the time the fault occurred. Since the page
has been referenced during the current clock tick, it is in the working set and not a
candidate for removal.
If it is 0, the page has not been referenced during the current clock tick and may be a
candidate for removal. To see whether or not it should be removed, its age (the current
virtual time minus its Time of last use) is computed and compared to . If the age is
greater than , the page is no longer in the working set. The new page is loaded here.
If R is 0 but the age is less than or equal to , the page is still in the working set. The
page is temporarily spared, but the oldest page is noted. If the entire table is scanned
without finding a candidate to evict, all pages are in the working set. In this case, if one
ore more unreferenced (R = 0) bits were found, the oldest one is evicted. In the worst
case, all pages have been referenced during the current clock tick, so one is chosen at
random for removal, preferable a clean bit, if one exists.
40
4.5.6 Segmentation
Paging suffers from internal fragmentation, that is, space inside an oversized page is
wasted. Segments, on the other hand, can vary in length, even during execution. For
example, the length of a segment may increase every time something is added (pushed) to
the stack. A segment is a logical entity and when programming, the user will see memory
as a set of objects, with no particular order. A segment might contain an array, or a
stack, and will usually contain only a single type of data.
So, using segments instead of pages makes it easier to have expanding/shrinking data
structures. Imagine a set of procedures in a paged memory system. They will all be
packed together in memory. If a procedure is modified and recompiled, resulting in it
being larger, all of the procedures after it will have their memory addresses shifted along.
On the other hand, a segmented memory will hold each procedure in its own segment,
starting at address 0. This means that if one procedure grows, it doesnt affect any of
the others. This greatly simplifies the linking up of procedures.
Segmentation also makes it easier to share code between processes, for example, a shared
library for printing graphics.
A segment is referenced using a two-part address containing the segment number and the
offset into that segment.
Paging
Yes
1
No
No
No
Segmentation
No
Many
Yes
Yes
Yes
41
4 Memory management
A segment table is maintained for each process. This table will contain many fields,
including the following:
Segment number
Access rights
Base address
Segment size
The table is part of the process context and is changed on each process switch. In the
same way as a page table, a segment table can either be represented in registers, or held
in main memory with a single register pointing to it. A register is also needed to store
the length of the page table since this value will differ for each process.
As we learned earlier, dynamic partitioning, such as segmentation, suffers from external
fragmentation, which is solved using compaction. This occurs as segments come and go
from main memory, but what happens if we cannot fit all of the segments into memory
at once? This leads us onto the concept of paged segments.
4.5.8.1 Paged segments
A system of pages segments splits segments up into k pages and maintains a separate
page table for each segment. Unfortunately, this requires a lot of extra hardware and is
therefore not very practical.
4.5.8.2 Software segments
This is the method used my most modern operating systems. It simply uses paging, but
considers pages m to m + l to be a segment. Each process has a local descriptor
table which describes the programs segments, including its code, stack, data, etc. There
is also a global descriptor table which describes the operating systems segments.
42
5 I/O subsystem
One of the functions of an operating system is to manage the hardware devices. It should
issue commands, handle errors and prove an interface between the hardware and the rest
of the system.
43
5 I/O subsystem
44
Figure 5.1: A flowchart showing how data is read from an I/O device using the polled
I/O method.
A direct memory access controller is a mini-processor dedicated to I/O tasks. When the
CPU needs to transfer data, it issues a command to the DMA controller, which performs
the transfer and raises an interrupt once the whole thing has finished. This means that
the CPU is not interrupted mid-transfer.
The DMA controller contains registers that the CPU uses to issue commands to it. There
is a memory address register, a byte count register and control registers, which specify
the I/O port to use, whether the data is being read from the device or written to the
device, the transfer unit (byte at a time or word at a time) and the number of bytes to
transfer per burst.
45
5 I/O subsystem
Figure 5.2: A flowchart showing how data is read from an I/O device using the interruptdriven I/O method.
5.4.1 Blocking
When a process makes a system call, the process is moved to a waiting for I/O queue.
When the call completes, the process is moved back to the Ready queue and is rescheduled.
The process will also be rescheduled if an error occurs. If an error does not occur, all
system calls will return all of the requested data. An example of an error is if the file was
not found, or the process does not have the required permission to access it.
5.4.2 Non-blocking
With a non-blocking call, the process is not rescheduled after making the call. Instead
of the process asking for, say, 100 bytes, it will request up to 100 bytes and instead of
the call returning all 100 bytes (or an error), it will return immediately with as much as
46
Figure 5.3: A flowchart showing how data is read from an I/O device using a DMA
controller.
possible. Often this will be zero bytes, which is a valid success. Non-blocking I/O may
be useful for user-interface code which checks whether a key has been pressed.
5.4.3 Asynchronous
Similar to non-blocking calls, asynchronous calls do not make the process wait. Instead,
a signal (a software interrupt) is sent to the process when the data from the system call
is ready. This allows the process to get all of the requested data and continue executing
while the call takes place.
47
5 I/O subsystem
trying to print some characters. The printer is a lot slower than the memory so instead
of having the CPU sit and wait for the printer, the data can be copied to a buffer.
There are different kinds of buffers:
Single buffering - the OS assigns a buffer to the user request
Double buffering - one buffer is emptied while a second is filled up, then they swap
Circular buffering - useful for buffering data streams, old data is overwritten if
consumer cannot keep up
5.6 Caching
We have already looked at cache memory and have learned that if a program needs to
read a memory word, the hardware checks to see if the data is in the cache before going
to main memory. What we have not discussed is how the operating system decides what
to cache.
48
6 File systems
The file system is the part of an operating system that deals with filesthe units that
information is stored on disk. The file system has two main parts: the directory service,
which maps file names to file identifiers and handles access and existence, and the storage
service, which provides the mechanism to store data on disk.
6.1 Files
A file is an abstract block of arbitrary information seen by a user. Files are used to
simplify the storage procedure so the user does not need to worry about how and where
the data is stored on disk.
49
6 File systems
Type - if the system enforces file types
Protection - permissions regarding read/write/execute
Time and user ID - who created and modified the file
This information is known as the meta-data and is stored in the file control block.
6.1.4 Hardlinks
A hardlink, also known as an alias, is a second name for a particular file. This allows
multiple files, with different names and in different directories, to point to the same data
on disk. The link does not have its own meta-data, it uses the same meta-data as the
original file. If the original file is moved, renamed or modified, the hardlink will continue
to refer to the same file. Hardlinks can only exist with one file system and cannot span
mount points, in the same way as a file cannot be created in one file system with its data
stored in another file system. It does not make sense to do this.
50
6.1 Files
Most operating systems do not allow hard links to directories, because this would create
the possibility of endless cycles.
When a file is deleted, how do we know if the data can be removed from the disk and the
space freed? Usually this isnt a problem because only one file will point to the data, and
if that file is deleted, then the data is no longer needed. However with hard links, there
may be other references to the same data, and although one file has been deleted, there
may be more files that still exist, linking to the same data. To overcome this problem, a
link counter can be used. This is incremented every time a hardlink is made to a file.
The OS can deallocate the space on disk only if the link counter is zero.
In UNIX, /etc/passwd holds a list of password entires, each of the form Username:Password:UID:GID:Use
ID info:Home directory:Shell.
A one-way function is used to encrypt passwords, that is, a function which is easy to
compute in one direction, but has a hard to compute inverse. This is a bit like a telephone
directory. Its very easy to convert a person to a phone-number, but doing a lookup in
the other direction is difficult.
To login, the following steps are performed.
1. Get username
2. Get password
3. Encrypt password
4. Check encrypted password against the version in /etc/passwd
5. If they match, instantiate login shell
51
6 File systems
The passwd file contains useful information, for example, the User ID info field can
contain data such as the users full name. For this reason, everyone is granted read
permission. Unfortunately, this opens users up to attack. It allows other users of the
system to read their encrypted passwords and decrypt them. To solve this, an x is
placed in the password field in the passwd file, indicating that the encrypted password is
actually stored in /etc/shadow : a file readable by the superuser only.
6.2 Directories
A file system could simply store every file in the same place, with no structure. This would
be simple to implement but not good for organisation or easily setting permissions. A
directory structure (with more than one directory) is a more sensible approach.
One improvement from having a single directory is to implement a two-level directory,
where each user gets their own directory. From a single users point of view, however,
this approach is just as bad as the first.
A hierarchical directory system is the most flexible. Directories can contain more directories! Files can be referred to using either an absolute path name which starts at the
root directory (e.g. /usr/Jake/pictures/dog.jpg) or a relative path name which gives
a file path relative to the current working directory.
UNIX has a distinguished root directory called /.
Each directory has two special entries: . and .. (pronounced dot and dotdot).
Dot refers to the directory itself while dotdot refers to its parent directory. These can
be used to access files higher up the hierarchy when using relative paths. For example,
../file.doc refers to the file with file name file.doc in the directory above the current
working directory.
Directories can be stored as files on disk, each with their own SFID.
52
53
6 File systems
Figure 6.1: The structure of an i-node in UNIX with addresses of 12 blocks with further
indirect addresses.
54
Owner
w x
Group
w x
World
w x
Three bits are used for each of owner, group and world. The three bits correspond to
read, write and execute permissions respectively.
Directory permissions When the read permission is set for a directory, permission to list
the files in the directory is granted. When the write permission is set for a directory, users
have the ability to modify entries in the directory. This includes creating, deleting and
renaming files. Finally, when the execute permission is set for a directory, permission is
granted to access file contents and meta-data, but not to list the files inside the directory
(unless read is also set).
6.3.1.3 Disk layout
Figure 6.3: The layout of blocks on a disk using the UNIX file system.
A UNIX disk is made up of a boot block, used to start the computer, followed by a number
of partitions, also made up of blocks. Block 1, in any partition, is the superblock, which
55
6 File systems
contains the number of i-nodes, the number of blocks on the disk and the start of the list
of free blocks. After the superblock is the set of i-nodes and then the data blocks.
6.3.1.4 Pipeline
In UNIX, a pipeline is a set of processes where the output of one process is used as the
input to the next one. Each pair of processes is connected by a pipe, which is a first-in
first-out (FIFO) data structure. If process A wants to send data to process B, it writes
the data to the pipe as though it were a file. Process B can then read the pipe, also as
though it were a file.
Pipes are created using the pipe system call, which creates the pipe and returns a pair
of file descriptors referring to the read and write ends of the pipe.
If a pipe becomes full, the UNIX operating system will buffer the first process output.
If the buffer fills up, then the first process is blocked until the second process is ready to
receive again.
A parent process can open an anonymous pipe and have a child process inherit the other
end of the pipe, or it can create several new processes and form a pipeline. A named
pipe is used for inter-process communication. A named pipe exists beyond the life of the
process and must be deleted one it is no longer being used. A named pipe also appears
as a file to a process.
56
Figure 6.4: A file allocation table storing files as linked lists of disk clusters.
Figure 6.4 shows how the directory entry for a file gives an offset into the FAT. In the
case of File A, the first block of the file is in cluster 2, and by following the pointers, we
can see that the rest of the file is in cluster 3 and then cluster 5, where the file ends.
FAT16 could only handle 2GB partitions and used 32KB clusters (which is rather large,
and large cluster sizes leads to internal fragmentation when the files stored in them do
not use up all of the space).
FAT32 was an improvement and could handle 8GB partitions with 4KB clusters. Further
enhancements include the ability to place the root directory anywhere in a partition (in
FAT16, the root directory had to immediately follow the FATs) and the ability to use a
backup copy of the FAT instead of the default. Of course, space is used more efficiently
than in FAT16 because of the smaller cluster size.
6.3.2.2 NTFS
Newer versions of Windows no longer use FAT file systems, but instead use NTFS (NT
file system).
The fundamental structure in NTFS is a volume. A volume may be a portion of a disk,
a full disk, or a collection of disks. Each volume is a linear sequence of fixed-size clusters.
Clusters are referred to using a 64-bit number which gives the offset from the start of the
volume.
File names in NTFS are limited to 255 characters while full file paths are limited to
57
6 File systems
32767 characters. Furthermore, file names use Unicode, allowing non-Latin characters to
be used.
All of the files and directories on a volume are stored in the Master File Table (MFT)
which is made up of 1KB records. Each record describes one file or directory by, listing
the files attributes (including the file name), and the disk addresses where its blocks are
located. If the file is too large, a second record can be used to continue listing the blocks.
The MFT is a file itself and can be stored anywhere in a volume. The OS finds the file
by looking in the boot clock of the disk, where the location of the MFT is set when the
OS is installed. The MFT can also grow, as more records are required.
To aid recovery after a system crash, all file system data structure updates are performed
inside transactions. Before a data structure is altered, the transaction writes a log record
that contains redo and undo information. After the transaction, the log is updated to say
that it succeeded. If the system crashes, the log can be processed to restore the system
to a consistent state.
Other features of NTFS include automatic compression when files are written to disk and
file encryption.
File systems need to be mounted in order to be used. One file system will be treated
by the OS as the root file system. Subsequent file systems are mounted on an existing
directory in an already mounted file system (usually the root). The directory is called a
bf mount point. Figure 6.5 shows a file system mounted on the /usr directory of the root
file tree. The previous contents of the /usr directory become invisible and it now refers
to the root directory of the newly mounted file system.
58
Figure 6.5: A file system mounted on the file tree of the root file system.
Older versions of Windows do not mount all file systems in a single tree but instead
maintain a forest of file trees. That is, you get A: and C: directories, which have no
parents. In newer versions of Windows, My Computer is the parent of each file system.
59
7 Protection
With confidential information being stored on computers, protection and security is a
vital part of operating system design.
Some techniques for protecting the system have already been discussed. These include
having two modes of operating: user mode and kernel mode; memory management hardware which keeps processes address spaces separate and access control on files.
7.1 Authentication
It is important for modern operating systems to authenticate users, to avoid unauthorised
access and to make sure the correct services, files, etc. are presented to the user. Using a
password, along with some sort of username, is the most common form of authentication.
When designing a login field, it is best to obscure the characters of the password as they
are typed in. An even better method is to hide the characters altogether, but the user
often wants visual feedback regarding how many characters theyve typed in so far.
There are a few common mistakes made when authenticating a username and password.
Firstly, it is important that any incorrect username and password combination takes the
same amount of time to return with the error message. For example, if a password of 20
characters is typed in, the system may compare each character in turn with the version
of the password on file. If the first character is wrong, it may return immediately with an
error message, however if the 20th character is wrong, it will have taken a small amount
of time to check the previous 19 characters before returning the error. By measuring this
amount of time, crackers can deduce how many characters in the password are correct.
Secondly, error messages such as invalid username or invalid password can greatly
help a cracker compared to a single error message that says invalid username or password. This kind of implementation tells the cracker whether they have a correct username or not, which is much more useful to them than just being told that the username
and password combination is wrong.
61
7 Protection
How do we store the passwords in the computer. The simplest method is to simply keep
a file listing all of the username and password pairs. It would be obvious to only make
this file accessible to the login program. This isnt totally secure, though. If somebody
did manage to get a hold of the file, everybodys passwords would be exposed. A better
method is to scramble the passwords before storing them by using a one-way function.
7.1.2 Salts
Adding a salt to a password before hashing it can help to make rainbow tables ineffective.
Lets say that the system adds the word salt onto each password before generating the
hash and Alice uses pa55word as her password. When Alice enters her password, the
62
7.3 Viruses
A computer virus is a program that can replicate itself and spread through computers by
attaching itself to an existing program. A virus will make unauthorised changes to the
computer, whether these changes are done with malicious intentions or not. Either way,
a virus is an unwanted piece of software.
63
7 Protection
7.4 Worms
A worm is a program that replicates itself and spreads to other computers, but unlike
a virus, does not need to attach itself to an existing program. A worm is a standalone
program, which causes harm to computers and networks.
64