Académique Documents
Professionnel Documents
Culture Documents
Operating Systems
Arobinda Gupta
Spring 2012
General Information
z
Textbook:
z
z
Course Webpage
z
Grading Policy
z
z
z
Midsem 30%
Endsem 50%
TA 20% (Two class tests, may also have
assignments)
Introduction
User-centric definition
z
System-centric definition
z
Types of Systems
z
Batch Systems
z
z
z
Multiprocessing Systems
z
Personal Computers
z
Time-sharing Systems
z
Resources Managed by OS
z
Physical
z
Logical
z
Process, File,
Main Components of an OS
z
Resource-Centric View
z
z
z
z
z
z
z
Process Management
Main Memory Management
File Management
I/O System Management
Secondary Storage Management
Security and Protection System
Networking (this is now integrated with most OS, but will
be covered in the Networks course)
User-centric view
z
z
System Calls
Command Interpreter (not strictly a part of an OS)
Process Management
z
z
OS responsibilities
z
z
z
Main-Memory Management
z
OS responsibilities
z
File Management
z
OS responsibilities
z
z
z
z
z
A buffer-caching system
Device driver interface
Drivers for specific hardware devices
Secondary-Storage Management
z
System Calls
z
z
z
Command-Interpreter System
z
the shell
Process Management
What is a Process?
z
CPU time
Memory space for code, data, stack
Open files
Signals
Data structures to maintain different information about the
process
z
z
Process creation
z
z
z
Process scheduling
z
Process termination
z
z
z
z
Process is removed
Resources are reclaimed
Some data may be passed to parent process (ex. exit status)
Parent process may be informed (ex. SIGCHLD signal in UNIX)
Process Creation
z
z
z
Execution possibilities
z
z
Process Termination
z
Parent is exiting
z
Process Scheduling
z
z
Representation of Process
Scheduling
Schedulers
z
z
z
Other Questions
z
Context of a Process
z
Context Switch
z
Handling Interrupts
z
z
z
z
z
z
z
Example: Timesharing
Systems
z
z
z
z
z
z
z
z
z
CPU Scheduling
Types of jobs
z
CPU Scheduler
z
z
z
Dispatcher
z
switching context
switching to user mode
jumping to the proper location in the user program to
restart that program
Scheduling Criteria
z
Optimization Criteria
z
z
z
z
z
z
z
P2
24
P3
27
30
P3
3
P1
6
30
Shortest-Job-First (SJR)
Scheduling
z
Example of Non-Preemptive
SJF
P3
Burst Time
7
4
1
4
P2
8
P4
12
16
P2
2
P3
4
P2
5
Burst Time
7
4
1
4
P1
P4
7
11
16
Properties of Exponential
Averaging
z
=0
z
n+1 = n
z Recent history does not count
=1
n+1 = tn
z Only the actual last CPU burst counts
If we expand the formula, each successive term has less
weight than its predecessor
z Recent history has more weight than old history
z
Priority Scheduling
z
Process
P1
P2
P3
P4
The Gantt chart is:
P1
0
P2
20
37
P3
P4
57
Burst Time
53
17
68
24
P1
77
P3
P4
P1
P3
P3
Multilevel Queue
z
foreground RR
background FCFS
number of queues
scheduling algorithms for each queue
method used to determine when to upgrade a process
method used to determine when to demote a process
method used to determine which queue a process will
enter when that process needs service
Three queues:
z
z
z
Scheduling
z
Process Coordination
Why is it needed?
Processes may need to share data
More than one process reading/writing the same data
(a shared file, a database record,)
Output of one process being used by another
Needs mechanisms to pass data between processes
Interprocess Communication
(IPC)
Mechanism for processes P and Q to
communicate and to synchronize their actions
Establish a communication link
Implementation Questions
How are links established?
Can a link be associated with more than two
processes?
How many links can there be between every pair
of communicating processes?
What is the capacity of a link?
Is the size of a message that the link can
accommodate fixed or variable?
Is a link unidirectional or bi-directional?
Bounded-Buffer Shared-Memory
Solution
Shared data
#define BUFFER_SIZE 10
typedef struct {
...
} item;
item buffer[BUFFER_SIZE];
int in = 0;
int out = 0;
Bounded-Buffer:
Producer Process
item nextProduced;
while (1) {
while (((in + 1) % BUFFER_SIZE) == out)
; /* do nothing */
buffer[in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
}
Bounded-Buffer:
Consumer Process
item nextConsumed;
while (1) {
while (in == out)
; /* do nothing */
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
}
Producer process
Shared data
#define B_SIZE 10
typedef struct {
...
} item;
item buffer[B_SIZE];
int in = 0;
int out = 0;
int counter = 0;
item nextProduced;
while (1) {
while (counter == BUFFER_SIZE)
; /* do nothing */
buffer[in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
counter++;
}
Consumer process
item nextConsumed;
while (1) {
while (counter == 0)
; /* do nothing */
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
counter--;
}
An Illustration
Assume counter is initially 5. One interleaving of
statements is:
producer: register1 = counter (register1 = 5)
producer: register1 = register1 + 1 (register1 = 6)
consumer: register2 = counter (register2 = 5)
consumer: register2 = register2 1 (register2 = 4)
producer: counter = register1 (counter = 6)
consumer: counter = register2 (counter = 4)
Race Condition
A scenario in which the final output is dependent
on the relative speed of the processes
Example: The final value of the shared data counter
depends upon which process finishes last
Atomic Operation
An operation that is either executed fully without
interruption, or not executed at all
The operation can be a group of instructions
Ex. the instructions for counter++ and counter-Note that the producer-consumer problems solution
works if counter++ and counter-- are made atomic
In practice, the process may be interrupted in the middle
of an atomic operation, but the atomicity should ensure
that no process uses the effect of the partially executed
operation until it is completed
entry section
critical section
exit section
remainder section /*remaining code */
Solutions vary depending on how these sections are
written
Petersens Solution
Only 2 processes, P0 and P1
Processes share some common variables to
synchronize their actions
int turn = 0
turn = i Pi s turn to enter its critical section
boolean flag[2]
initially flag [0] = flag [1] = false
flag [i] = true Pi ready to enter its critical section
Process Pi
do {
flag [i]:= true;
turn = j;
while (flag [j] and turn = j) ;
critical section
flag [i] = false;
remainder section
} while (1);
Meets all three requirements; solves the criticalsection problem for two processes
Can be extended to n processes by pairwise mutual
exclusion too costly
Bakery Algorithm
Notation < lexicographical order (ticket #,
process id #)
(a,b) < c,d) if a < c or if a = c and b < d
max (a0,, an-1) is a number, k, such that k ai for i 0,
, n 1
Shared data
boolean choosing[n];
int number[n];
Data structures are initialized to false and 0
respectively
Bakery Algorithm
do {
choosing[i] = true;
number[i] =
max(number[0], number[1], , number [n 1]) +1;
choosing[i] = false;
for (j = 0; j < n; j++) {
while (choosing[j]) ;
while ((number[j] != 0) &&
(number[j,j] < number[i,i])) ;
}
critical section
number[i] = 0;
remainder section
} while (1);
Semaphore
Widely used synchronization tool
Does not require busy-waiting
CPU is not held unnecessarily while the process is
waiting
A Semaphore S is
A data structure with an integer variable S.value and a
queue S.q of processes
The data structure can only be accessed by two
atomic operations, wait(S) and signal(S) (also called
P(S) and V(S))
Pj
M
wait(flag)
B
Pitfalls
Use carefully to avoid
Deadlock two or more processes are waiting
indefinitely for an event that can be caused by only
one of the waiting processes
Starvation indefinite blocking. A process may
never be removed from the semaphore queue in
which it is suspended
Example of Deadlock
Let S and Q be two semaphores initialized to 1
P0
wait(S);
wait(Q);
M
signal(S);
signal(Q)
P1
wait(Q);
wait(S);
M
signal(Q);
signal(S);
Internal Implementations of
Semaphores
How do we make wait and signal atomic?
Should we use another semaphore? Then who makes that
atomic?
Classical Problems of
Synchronization
Bounded-Buffer Producer-Consumer Problem
Readers and Writers Problem
Dining-Philosophers Problem
Bounded-Buffer Problem
Shared data
semaphore full, empty, mutex;
Initially:
full = 0, empty = n, mutex = 1
Bounded-Buffer Problem:
Producer Process
do {
wait(empty);
wait(mutex);
signal(mutex);
signal(full);
} while (1);
Bounded-Buffer Problem:
Consumer Process
do {
wait(full)
wait(mutex);
signal(mutex);
signal(empty);
} while (1);
Readers-Writers Problem
A common shared data
Reader process only reads data
Writer process only writes data
Synchronization requirements
Writers should have exclusive access to the data
No other reader or writer can access the data at that
time
Multiple readers should be allowed to access the data
if there is no writer accessing the data
Writer
wait(wrt);
perform write
signal(wrt);
Reader
wait(mutex);
readcount++;
if (readcount == 1)
wait(rt);
signal(mutex);
perform read
wait(mutex);
readcount--;
if (readcount == 0)
signal(wrt);
signal(mutex):
Dining-Philosophers Problem
Shared data
semaphore chopstick[5];
Initially all values are 1
Dining-Philosophers Problem
Philosopher i:
do {
wait(chopstick[i])
wait(chopstick[(i+1) % 5])
eat
signal(chopstick[i]);
signal(chopstick[(i+1) % 5]);
think
} while (1);
Other Synchronization
Constructs
Programming constructs
Specify critical sections or shared data to be protected
by mutual exclusion in program using special
keywords
Compiler can then insert appropriate code to enforce
the conditions (for ex., put wait/signal calls in
appropriate places in code)
Examples
Critical regions, Monitors, Barriers,
Memory Management
Memory Allocation
z
Contiguous Allocation
z
Non-contiguous Allocation
z
Contiguous Allocation
z
z
z
z
z
Easy to manage
Problems:
z
z
OS
OS
OS
process 5
process 5
process 5
process 5
process 9
process 9
process 8
process 2
process 10
process 2
process 2
process 2
Dynamic Storage-Allocation
Problem
How to satisfy a request of size n from a list of free holes?
z
Fragmentation
z
Bitmap method
z
z
z
z
Non-contiguous Allocation
z
z
Paging
Segmentation
Memory Abstraction
z
z
Memory Abstraction:
Logical or Virtual Addresses
z
z
z
z
z
z
z
z
z
z
A Simple Solution
z
z
z
z
Accessible only by OS
Paging
z
z
z
Page Table
z
z
z
z
Address Translation
Architecture
z
z
Hierarchical Paging
Hashed Page Tables
Inverted Page Tables
p2
10
page offset
d
12
Address-Translation Scheme
z
Protection
z
Shared Pages
z
z
z
Segmentation
z
z
Segmentation Architecture
z
Segmentation Architecture
(Cont.)
z
Segmentation Hardware
Example of Segmentation
Sharing of Segments
Virtual Memory
Basic Concept
z
z
z
z
Demand paging
Demand segmentation
Demand Paging
z
Some questions
z
z
z
Page fault
Valid bit
Dirty/Modified bit
Valid-Invalid Bit
z
z
z
Page Fault
z
z
z
z
z
z
z
Performance of Demand
Paging
z
z
z
Page in Disk
z
algorithm
performance want an algorithm which will result in
minimum number of page faults.
Page Replacement
z
Prevent over-allocation of memory by modifying pagefault service routine to include page replacement
z
z
Page Replacement
First-In-First-Out (FIFO)
Algorithm
z
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3,
4, 5
3 frames (3 pages can be in memory at a
time per process)
9 page faults
Beladys Anamoly
z
Counter-intuitive
10 page faults
Beladys Anamoly
Optimal Algorithm
z
6 page faults
3
4
z
z
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
Counter implementation
z Every page entry has a counter; every time page is
referenced through this entry, copy the clock into the
counter
z When a page needs to be changed, look at the
counters to determine which are to change
Page referenced:
z move it to the top
z requires 6 pointers to be changed
No search for replacement
Reference bit
z
z
z
z
Second chance
z
z
z
Second-Chance (clock)
Page-Replacement Algorithm
Counting Algorithms
z
z
z
Allocation of Frames
z
fixed allocation
priority allocation
Fixed Allocation
z
Priority Allocation
z
Locality of reference
z
z
z
Thrashing
z
z
z
Working-Set Model
z
z
z
z
Working-set model
z
z
Smaller is better
Capture locality
z
Larger is better
Smaller is better
Larger is better
Other Considerations
z
Prepaging
z
TLB Reach
z
z
z
Program structure
int A[][] = new int[1024][1024];
Each row is stored in one page
Program 1
for (j = 0; j < A.length; j++)
for (i = 0; i < A.length; i++)
A[i,j] = 0;
z
z
File Management
Two Parts
z
Filesystem Interface
z
Filesystem design
z
Filesystem Interface
Basic Topics
z
z
z
z
z
z
File Concept
Access Methods
Directory Structure
File System Mounting
File Sharing
Protection
File Concept
z
File Types
z
Data
z
z
z
z
z
Text, binary,
Program
Regular files stores information
Directory stores information about file(s)
Device files represents different devices
File Structure
z
z
Lines
Fixed length
Variable length
Complex Structures
z
z
Formatted document
Relocatable load file
File Operations
z
z
z
z
z
z
z
Create
Write
Read
Reposition within file file seek
Delete
Truncate
Open(Fi) search the directory structure on disk for
entry Fi, and move the content of entry to memory
Close (Fi) move the content of entry Fi in memory to
directory structure on disk
Access Methods
z
Sequential Access
read next
write next
reset
Direct Access
read n
write n
position to n
read next
write next
n = relative block number
Sequential-access File
Directory Structure
z
Directory
Files
F1
F2
F3
F4
Fn
A Typical File-system
Organization
Information in a Device
Directory
z
z
z
z
z
z
z
z
z
Name
Type
Address
Current length
Maximum length
Date last accessed (for archival)
Date last updated (for dump)
Owner ID (who pays)
Protection information (discuss later)
Operations Performed on
Directory
z
z
z
z
z
z
Single-Level Directory
z
Problems
Naming problem
z Grouping problem
z
Two-Level Directory
z
Path name
Can have the same file name for different user
Efficient searching
No grouping capability
Tree-Structured Directories
Tree-Structured Directories
(Cont.)
z
Efficient searching
Grouping Capability
cd /spell/mail/prog
type list
Tree-Structured Directories
(Cont.)
z
z
Acyclic-Graph Directories
z
Acyclic-Graph Directories
(Cont.)
z
z
z
z
z
z
sys
dev
etc
bin
//
local
z
z
z
z
users bin
adm
Ex. can now mount some other filesystem fs2 on /usr/adm, will
hide all files under /adm under fs1 and access to /usr/adm will go
to corresponding part of fs2
File Sharing
z
Protection
z
Types of access
z
z
z
z
z
z
Read
Write
Execute
Append
Delete
List
Filesystem Implementation
Basic Topics
z
z
z
z
z
z
z
z
Disk Layout
z
z
z
z
Allocation Methods
z
Contiguous allocation
Linked allocation
Indexed allocation
Contiguous Allocation
z
z
z
z
Fragmentation possible
Extent-Based Systems
z
Linked Allocation
z
z
z
Linked Allocation
File-Allocation Table
Indexed Allocation
z
index table
Two-level Indexing
z
outer-index
index table
file
i-nodes
z
z
FCB in Unix
Contains file attributes and disk address of
blocks
One block can hold only limited number of disk
block addresses, limits size of file
Solution: use some of the blocks to hold address
of blocks holding address of disk blocks of files
z
Unix i-node
z
z
z
File Attributes
12 direct pointers
1 singly indirect pointer
z
Directory Implementation
z
z
z
z
simple to program
time-consuming to execute
Hash Table linear list with hash data structure
z
z
Free-Space Management
z
n-1
bit[i] =
678
1 block[i] free
0 block[i] occupied
Free-Space Management
(Cont.)
z
z
z
Free-Space Management
(Cont.)
z
Need to protect:
z
z
Virtual File Systems (VFS) provide an objectoriented way of implementing file systems.
VFS allows the same system call interface (the
API) to be used for different types of file
systems.
The API is to the VFS interface, rather than any
specific type of file system.
z
z
Performance
z
Various Disk-Caching
Locations
Page Cache
z
z
z
Recovery
z
Disk Management
Disk Structure
z
Disk drives are addressed as large 1dimensional arrays of logical blocks, where the
logical block is the smallest unit of transfer
The 1-dimensional array of logical blocks is
mapped into the sectors of the disk sequentially
z
Seek time is the time for the disk to move the heads to
the cylinder containing the desired sector
z Typically 5-10 milliseconds
Rotational latency is the additional time waiting for the
disk to rotate the desired sector to the disk head
z Typically, 2-4 milliseconds
Disk Scheduling
z
z
z
z
z
z
FCFS
z
z
z
FCFS
Illustration shows total head movement of 640 cylinders.
SSTF
z
z
z
SSTF (Cont.)
SCAN
z
SCAN (Cont.)
C-SCAN
z
z
C-SCAN (Cont.)
C-LOOK
z
z
Version of C-SCAN
Arm only goes as far as the last request in each
direction, then reverses direction immediately,
without first going all the way to the end of the
disk.
C-LOOK (Cont.)
Selecting a Disk-Scheduling
Algorithm
z
z
Disk Management
z
Application Interface
z
CPU Scheduling
2.4 Kernel
CPU1
CPU2
CPU3
2.6 Kernel
CPU1
CPU2
CPU3
Linux Scheduling
z
z
z
3 scheduling classes
z SCHED_FIFO and SCHED_RR are realtime classes
z SCHED_OTHER is for the rest
140 Priority levels
z 1-100 : RT priority
z 101-140 : User task priorities
Three different scheduling policies
z One for normal tasks
z Two for Real time tasks
z
z
Basic Philosophies
z
z
z
z
The Runqueue
z
z
z
z
Scheduler Runqueue
z
z
z
z
z
z
z
z
Repeat
thread_info->flags
thread_info->cpu
state
prio
static_prio
run_list
array
sleep_avg
timestamp
last_ran
activated
policy
cpus_allowed
time_slice
first_time_slice
rt_priority
bsfl on Intel
Scheduling Components
z
z
z
z
z
Static Priority
Sleep Average
Bonus
Dynamic Priority
Interactivity Status
Static Priority
Each task has a static priority that is set
based upon the nice value specified by
the task.
z static_prio in task_struct
z Value between 0 and 139 (between
100 and 139 for normal processes)
z Each task has a dynamic priority that is
set based upon a number of factors
z
Sleep Average
z
Bonus
1 second
10
z
z
z
Interactive Processes
z
Using Quanta
z
z
z
z
z
Avoiding Starvation
z
};
Swapping Arrays
struct prioarray *array =
rq->active;
if (array->nr_active == 0) {
rq->active = rq->expired;
rq->expired = array;
}
Real-Time Scheduling
z
Real-Time Policies
z
Round-robin: SCHED_RR
z
Static priority
Process is only preempted for a higher-priority process
No time quanta; it runs until it blocks or yields voluntarily
RR within same priority level
As above but with a time quanta (800 ms)
Multiprocessor Scheduling
z
z
Locking Runqueues
z
z
z
Processor Affinity
z
z
z
z
Load Balancing
To keep all CPUs busy, load balancing
pulls tasks from busy runqueues to idle
runqueues.
z If schedule finds that a runqueue has no
runnable tasks (other than the idle task), it
calls load_balance
z load_balance also called via timer
z
z
z
Load Balancing
z
Self Balance
Insertion and deletion operation in O(log(n))
z
Ext3 Filesystem
Introduction
Common file system on linx
z Introduced in 2001
z Supports max file size of 16 GB to 2 TB
z Max filesystem size can be from 2 TB to 32
TB
z Maximum 32000 sbdirectories under a
directory
z Options for block size from 1 KB to 4 KB
z
Block Groups
z
Partition Layout
Superblock
z
Bitmaps
z The
Inodes
zInode
z
Tables:
Inode table contains the inodes that describe the files in the
group
zInodes:
z
Ext2 inode
File type
Access rights
File length
Time of last file access
Time of last change of inode
Time of last change of file
Hard links counter
Number of data blocks
Pointers to data blocks
Access control lists
Directories
z An
File type
z Name: file/subdirectory name
z
When Ext3 wants to delete a directory entry, it just increases the record
length of the previous entry to the end to deleted entry.
Allocating inodes
z If
z If
z If
Goal
Get the new block near the last block allocated to the
file
zPreallocation
z
z
entry allocation: