Vous êtes sur la page 1sur 76

Virtual File System(VFS)

 INTRODUCTION TO VFS
 VFS SUPER BLOCK
 VFS INODES
 FILES & ITS OPERATIONS
 DENTRY
 REGISTERING & MOUNT

2
 Different Operating System have different
file systems.

 Sometimes, we need a mechanism that can


interact with different file systems (on-disk
or not) and represent them in the host
environment.

3
 Linux manages to support multiple disk types in the
same way other Unix variants do, through a
concept called the Virtual File system.

 VFS is a software layer in the kernel that provides


the file system interface to user space program.

 It provides an abstraction within the kernel which


allows different file system implementation to co-
exist.

4
5
 $ cp /floppy/TEST /temp/test

 Temp – local Linux directory (typically ext2)

 Floppy – mount point of MS-DOS diskette

 cp does not know anything about ext2 or MS-DOS

 cp interacts with the VFS by means of generic


system calls

6
 VFS uses a common file model which interacts
with different “objects” or data structures for
representing supported file systems.

 These data structures are :


◦ Superblock structure
Stores information concerning a mounted file system.

◦ Inode structure
Stores general information about a specific file or Inode.

7
 File structure
o Stores information about the interaction between an open file
and a process

 Dentry object
o Stores information about the linking of a directory
entry with the corresponding file.

 Vfsmnt and nameidata


o Stores information about the linking of a directory entry with the
corresponding file.

8
9
 INTRODUCTION TO VFS
 VFS SUPER BLOCK
 VFS INODES
 FILES & ITS OPERATIONS
 DENTRY
 REGISTERING & MOUNT

10
 A separate superblock structure is maintained for every
Mounted File System.

 Representation of a mounted file system is through


structure super_block. It is initialized by the function
“read_super” in the VFS.

 If superblock is present and it returns it. Else it calls the fs-


specific function to create a superblock.

 Superblock of the mounted file system contains information


about
Block size
Access rights etc……

11
 The VFS keeps a list of the mounted file systems with their
VFS superblocks.

 The fact that it is mounted is stored in a structure


vfsmount.

12
13
14
lock_super(struct super_block *sb)
{
If(sb->s_lock)
wait_on_super(sb);
Sb->s_lock=1;
}

unlock_super(struct super_block *sb)


{
sb->s_lock=0;
wake_up(&sb->s_wait);
}

15
 Each specific file system can define its own
super block operations.
 Example: read_inode( ) system call is

actually executed as sb->s_op-


>read_inode(inode)
16
 Read_inode
Read a specific inode from the mounted file system.
 write_inode
Gets called for those inodes that have been marked dirty.
 put_inode
Gets called whenever the reference count of the inode
decreases.
 delete_inode
Gets called whenever reference count of the inode reaches 0.
Deletes both the VFS inode and the disc inode.
 notify_change
Gets called when inode attributes are changed.
It further marks the inode as dirty.

17
 put_super
Releases the super block object because the corresponding file
system is unmounted.
 write_super
Gets called when VFS decides to write the superblock to disc
Obviously, its not needed for the file system marked as READ
ONLY.
 remount_fs
Called when file system is to be re-mounted with new options.
Used to change the various mount options without unmounting
the file system.
Example:- changing the read only file system to writeable file
system.
 umount_begin
Only NFS provides this option.
Called in early stages of the unmounting process.
Causes any incomplete transaction on the file system to fail
quickly rather than block.
It will not make any file system become unmountable but it
allows any processes using the file system as killable rather
than being in uninterrupted wait.

18
 INTRODUCTION TO VFS
 VFS SUPER BLOCK
 VFS INODES
 FILES & ITS OPERATIONS
 DENTRY
 REGISTERING & MOUNT

19
 An inode contains the management information for a
particular file.

 The inode already contains a few block numbers to ensure


efficient access to small files.

 Access to larger files is provided via indirect blocks that


contain block numbers. The indirect blocks come in three
flavors:
◦ *Indirect reference
◦ *Double indirect reference
◦ *Triple indirect reference

 Linux keeps a cache of active and recently used inodes.

20
 These inodes can be accessed in two ways.

 Through dcache
Each dentry in the dcache refers to an inode, and thereby keeps that
inode in the cache.

 Through the inode hash table


Each inode is hashed (to an 8 bit number) based on the address of the file-
system's super-block and the inode number. Inodes with the same
hash value are then chained together in a doubly linked list.

Access though the hash table is achieved using


• The iget function (called when inode not found in dcache).
• nfsd (better used by having the file-system provide a filehandle-to-
inode mapping function )

21
22
23
24
Each inode object always appears in one of the
following circular doubly linked lists

In all cases, the pointers to the adjacent


elements are stored in the i_list field

25
26
27
 default_file_ops:
◦ Pointer to default table of file operations for files opened
on this inode.
◦ When a file is opened
• It intializes f_op field in file structure.
• Then open method in file_operations table is called.
• The method may choose to change f_op to different method
table.(ex :-device special file)
 create:
◦ Only meaningful on directory inodes.
◦ If successful
Gets a new empty inode from the cache with
get_empty_inode.
Fill in the fields and insert it into the hash table with
insert_inode_hash.
Marks it dirty with mark_inode_dirty.
Instantiate it into the dcache with d_instantiate.

28
 lookup:
◦ Only meaningful on directory inodes.
◦ checks if that name exists in the directory
updates the dentry using d_add if it does.
involves finding and loading the inode.
 link:
◦ Only meaningful on directory inodes.
◦ makes a hard link
◦ On success, calls d_instantiate to link the inode of the linked file to
the new dentry
 unlink:
◦ Only meaningful on directory inodes.
◦ Removes the name from the directory
◦ then d_delete the dentry on success.
 symlink:
◦ Only meaningful on directory inodes.
◦ Creates a symbolic link in the given directory with the given name
having the given value.
◦ On success, d_instantiate the new inode into the dentry.
29
 mkdir:
◦ Creates a directory with the given parent, name, and mode.
 rmdir:
◦ Remove the named directory (if empty)
◦ And d_delete the dentry.
 mknod:
◦ Creates a device special file with the given parent, name, mode, and
device number.
◦ Then d_instantiate the new inode into the dentry.
 rename:
◦ Renames the object to have the parent and name given by the second
inode and dentry.
◦ All generic checks, including that the new parent isn't a child of the old
name, have already been done.
 readlink:
◦ The symbolic link referred to by the dentry is read and the value is
copied into the user buffer (with copy_to_user) with a maximum length
given by the int.

30
 INTRODUCTION TO VFS
 VFS SUPER BLOCK
 VFS INODES
 FILES & ITS OPERATIONS
 DENTRY
 REGISTERING & MOUNT

31
#include<stdio.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
int main()
{
char buffer[512];
int infile = open("hello.txt", O_RDONLY);
read(infile, buffer, 512);
printf("%s\n",buffer);
close(infile);
}

32
 What file system does “hello.txt” live on?

 Do the open(), read(), and close() functions know exactly


how
to perform those operations within the different file
systems?

 The task of opening, reading and writing files will be


different in different file systems

33
 you make a call to the standard C library function

 Library function actually calls the read system call

 system call figures out which file system we are reading


from and passes the read request onto the appropriate file
system driver

34
35
 Suppose a file system is being mounted

 kernel will check file system type

 Calls FS driver and asks for superblock

 Driver calls get_sb() function and pass the VFS


superblock object to kernel

36
 user tries to access a file “/usr/bin/nano” on mounted file
system

 kernel asks fs driver to check if this file exists

 driver calls read_inode() function to read the inode

 read_inode() should be able to find the inode on the disk,


and then create and fill in a VFS inode struct

37
 dentry objects are also created

 The kernel asks for file object that represents the file /usr/bin/nano!”
from driver

 fs driver creates and initializes the VFS file struct and return it to
the kernel

 kernel tells the user file is ready now

 Here all the tasks for which kernel is responsible are done by VFS
layer.

38
39
 The VFS file object is an inmemory representation of an
open file

 Needs to be created for every open file

 Contains data like the dentry, access mode(r, w, rw), offset


position, file_operations struct etc

 Have no corresponding image on disk, so no "dirty" field

 Since several processes may access the same file


concurrently file ptr can’t be kept in the inode object

40
 Each file object is always included in one of the
following circular doubly linked lists :-

o list of "unused" file objects


1. Acts as a memory cache for the file objects.
2. f_count field is null.
3. Address of the first element in the list is stored in the free_filps
variable.

o list of "in use" file objects


1. Each element in the list is used by at least one process.
2. f_count field is not null.
3. Address of the first element in the list is stored in the inuse_filps
variable.

41
Defined in linux/fs.h

42
 f_next,f_pprev
 link files together into one of a number of lists
 There is one list for each active file-system, starting at the s_files pointer
in the super-block

 f_dentry
 records the dcache entry that points to the inode for this file

 f_op
 points to a struct containing methods to use on this file

 f_count
 number of references to this file. One for each different user- process
file descriptor.

43
 f_flags
 stores the flags for file such as access type (r/w), non-blocking,
appendonly etc.
 flags like O_CREAT, O_TRUNC, etc are relevant at the time of
opening, so not stored in f_flags

 f_mode
 f_mode stores the read and write access as two separate bits

 f_pos
 records current file position for the next read/write request

44
 f_reada, f_remax, f_raend, f_ralen, f_rawin
 These five fields are used to keep track of sequential access patterns on
the file and to determine how much read-ahead to do

 f_owner
 Stores a process id and a signal to send to the process when certain
events happen with the file

 f_uid, f_gid
 get set to the owner and group of the process which opened the file

45
 f_error
 used by the NFS client file-system code to return write errors

 f_version
 used by the underlying fs to help cache state, and check for
the cache being invalid
 changes whenever the file has its f_pos value changed

 private_data
 used by device drivers, and even a few file-systems, to store extra per-
open-file information.

46
Defined in linux/fs.h:
struct file_operations {

loff_t (*llseek) (struct file *, loff_t, int);


ssize_t (*read) (struct file *, char *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release) (struct inode *, struct file *);
int (*fsync) (struct file *, struct dentry *);
int (*fasync) (int, struct file *, int);
int (*check_media_change) (kdev_t dev);
int (*revalidate) (kdev_t dev);
int (*lock) (struct file *, int, struct file_lock *);
};

47
 llseek(file, offset, whence)
 implements the lseek system call
 called when the VFS needs to move the file position index
 updates the f_pos field

 read(file, buf, count, offset)


 implements the read system call

 write(file, buf, count, offset)


 allows writing to a file such as when using the write system call

 readdir(dir, dirent, filldir)


 returns the next directory entry of a directory in dirent

48
 poll(file, poll_table)
 used to implement the select and poll system calls

 ioctl(inode, file, cmd, arg)


 sends a command to an underlying hardware device
 applies only to device files

 mmap(file, vma)
 implements memory mapping of files into a process address
space

 open(inode, file)
 opens a file by creating a new file object and linking it to the corresponding
inode object
 initialises the “f_op” member with the “default_file_ops” in the inode
structure

49
 flush(file)
 called when a file descriptor is closed
 f_count of the file object is decremented
 only fs that currently defines this method is the NFS client

 release(inode, file)
 called when the last reference to an open file is closed
 f_count field of the file object becomes 0
 releases the file object

 fsync(file, dentry)
 implements the fsync system call

50
 fasync(file, on)
 Enables or disables asynchronous I/O notification by means of
signals
 No file-systems currently use this method

 check_media_change(dev)
 checks whether there has been a change of media since the last
operation on the device file
 applicable to block devices that support removable media, such as
CDROM
 called in read_super when a file-system is about to be mounted
 If it returns true at this point, all buffers associated with the device are
invalidated

 revalidate(dev)
 called after buffers have been invalidated after a media change, as
reported by check_media_change

51
 lock(file, cmd, file_lock)
 allows a file service to provide extra handling of POSIX locks
 useful particularly for network file-systems
 When a process is trying to find what locks are present,
information returned by this method is used.

The methods just described are available to all possible file


types. However, only a subset of them applies to a specific
file type; the fields corresponding to unimplemented
methods are set to NULL.

52
 INTRODUCTION TO VFS
 VFS SUPER BLOCK
 VFS INODES
 FILES & ITS OPERATIONS
 DENTRY
 REGISTERING & MOUNT

53
• A directory / file entry on a file-system is transformed by
the VFS into a dentry object.

• A dentry object is created by the kernel for every


component of a pathname that a process looks up.

• The dentry object associates the component to its


corresponding inode.

For example:
When looking up the /tmp/test pathname,
The kernel creates three dentry objects
one for the / root directory,
second for the tmp entry of the root directory
third for the test entry of the /tmp directory.

• Dentry objects have no corresponding image on disk.


 A dentry object contains information like the name of the
file or folder it represents, the associated inode, list of child
dentries ( if it is a directory), and parent dentry.

 Used to verify that a file or directory does actually exist,


and traverse a file systems directory tree by looking at a
dentry's parents and children.

 The dentries act as an intermediary between Files and


Inodes. Each file points to the dentry that it has open. Each
dentry points to the inode that it references. This implies
that for every open file, the dentry of that file, and of all the
parents of that file are cached in memory. This allows a full
path name of every open file to be easily determined.
 Dentrys also record file-system mounting relationships.

 The VFS layer does all management of path names of files,


and converts them into entries in the dcache before
allowing the underlying file-system to see them.

 Dentry objects are stored in dentry cache called dcache.

 Dentry objects are created and destroyed by invoking


kmem_cache_alloc( ) and kmem_cache_free( ).
Descriptions of Fields :
• d_count :
• This is a simple reference count.
• How many objects are referring to this entry.
• d_flags :
• There are currently two possible flags, both for use by specific
file-system implementations.
• d_inode :
• A pointer to the related inode
• This field may be NULL, which indicates the name is known
not to exist.
• d_parent :
• This will point to the parent dentry.
• For the root of a file-system, or for an anonymous entry like
that for a file, this points back to the containing dentry itself.
• d_mounts :
• For a directory that has had a file-system mounted on it, this
points to the root dentry of that file-system. For other
dentries, this points back to the dentry itself.
• d_covers :
• This is the inverse of d_mounts.
• For the root of a mounted file-system, this points to the
dentry of the directory that it is mounted on. For other
dentrys, this points to the dentry itself.
• d_hash :
• This doubly linked list chains together the entries in one hash
bucket.
• d_lru :
• This doubly linked list chains together the unused and
negative entries.
• The head of the list is the dentry_unused global variable.
• It is stored in Least Recently Used order.
• d_child :
• This is used to link together all the children of the d_parent of
this dentry.
• d_subdirs :
• This is the head of the d_child list that links all the children of
this dentry.
• d_alias :
• As files may have multiple names in the file-system through
multiple hard links, it is possible that multiple dentrys refer to
the same inode.
• When this happens, the dentrys are linked on the d_alias field.
• The inode's i_dentry field is the head of this list.
• d_name :
• The d_name field contains the name of this entry, together
with its hash value.
• The name subfield may point to the d_iname field of the
dentry or to a separately allocated string.
• d_time:
• Used to record when this entry was last known to be valid.
• Used by underlying file-systems.
• d_op:
• This points to the struct dentry_operations with specifics for
how to handle this dentry.
• d_sb:
• This points to the super-block of the file-system on which the
object refered to by the dentry resides.
• d_reftime:
• This is set to the current time whenever the d_count reaches
zero, but it is never used.
• d_fsdata:
• This is available for specific file-systems to use as they wish.
• This is currently only used by nfs to store a file handle.
• d_iname:
• This stores the first 16 characters of the name of the file for
easy reference.
• If the name fits completely, then d_name.name points here,
otherwise it points to separately allocated memory.
• Each dentry object may be in one of four states:
• Free
– The dentry object contains no valid information and is
not used by the VFS.
– The corresponding memory area is handled by
dcache.

• Unused
– The dentry object is not currently used by the kernel.
– The d_count usage counter of the object is null, but
the d_inode field still points to the associated inode.
– The dentry object contains valid information, but its
contents may be discarded if necessary to reclaim
memory.
Contd..
• In use
– The dentry object is currently used by the kernel.
– The d_count usage counter is positive and the
d_inode field points to the associated inode object.
– The dentry object contains valid information and
cannot be discarded.

• Negative
– The inode associated with the dentry no longer
exists, because the corresponding disk inode has
been deleted.
– The d_inode field of the dentry object is set to NULL,
but the object still remains in the dentry cache so
that further lookup operations to the same file
pathname can be quickly resolved.
Since reading a entry from disk and constructing the
corresponding dentry object requires considerable time.
In most cases the same file needs to be repeatedly
accessed.

In order to maximize efficiency in handling dentries, Linux


uses a dentry cache, which consists of two kinds of data
structures:
• A set of dentry objects.
• A hash table to derive the dentry object associated with
a given filename and a given directory quickly.
Contd..
The dentry cache also acts as a controller for a inode cache
.
• The inodes in kernel memory that are associated with
unused entries are not discarded, since the dentry
cache is still using them and therefore their i_count fields
are not null.
• Thus, the inode objects are kept in memory and can be
quickly referenced by means of the corresponding
dentries.

All the "unused" dentries are included in a doubly linked "


Least Recently Used" list sorted by time of insertion. The
addresses of the first and last elements of the LRU list are
stored in the next and prev fields of the dentry_unused
variable.

Each "in use" dentry object is inserted into a doubly linked


list specified by the i_dentry field of the corresponding
Contd..
An "in use" dentry object may become "negative” and is
moved into the LRU list of unused dentries.
Each time the kernel shrinks the dentry cache, negative
dentries move toward the tail.

The hash table is implemented by means of a


dentry_hashtable array.
Each element is a pointer to a list of dentries that hash to
the same hash table value.
The d_hash field of the dentry object contains pointers to
the adjacent elements in the list associated with a single
hash value.
Dentry in the system….
The following operations are defined for each Dentry.

struct dentry_operations {
int (*d_revalidate)(struct dentry *, int);
int (*d_hash) (struct dentry *, struct qstr *);
int (*d_compare) (struct dentry *, struct qstr *, struct
qstr *);
void (*d_delete)(struct dentry *);
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
};
About the operations..
• d_revalidate(dentry) :
• This method is called whenever a path lookup uses an entry
in the dcache, in order to see if the entry is still valid.
• Default method does nothing. NFS defined its own.

• d_hash(dentry, hash) :
• called to calculate hash value.

• d_compare(dir, name1, name2) :


• called when a dentry should be compared with another.
• It should return 0 only if they are the same.

• d_delete(dentry) :
• This is called when the reference count reaches zero, before
the dentry is placed on the dentry_unused list.
Contd..
• d_release(dentry) :
• This is called just before a dentry is finally freed up.
• It can be used to release the d_fsdata if any.
• d_iput(dentry, ino) :
• If defined, this is called instead of iput to release the inode
when the dentry is being discarded.
 INTRODUCTION TO VFS
 VFS SUPER BLOCK
 VFS INODES
 FILES & ITS OPERATIONS
 DENTRY
 REGISTERING & MOUNT

72
 REGISTERING AND MOUNTING AT BOOT TIME
boot(){

File_system_setup(){
Register_file_system(&file_systems_type){
Name = Get_file_system(file_system_name);
If(Name is found)
return &file_system_type;
Else
boot_error;
}
}
Mount_root(){
Flip -> F_mode = root_mountflags;
Dummy_inode->i_rdev = ROOT_DEV;
blkdev_open( Flip, Dummy_inode);
For each filesystem_list {
Read_super(); //fail except one and creates an inode object and a dentry object for the root directory
}
Fs_struct->root = Fs_struct->current = dentry_obj;
add_vfsmnt();
}
}

73
 Sys_mount()
 Input:
1.pathname to device file
2.pathname to mount point
3.file_system_type
4.mount_flags
5.ptr to file system dependent data structure
 {
If(don’t have permissions || not in kernel mode) exit;
FS_type = get_fs_type();
If(!FS_type) exit; //reboot & register
If device is on-disk then get its dentry object and check it is valid block device and operational;
Else its not on-disk then get_un-named_dev();
Do_mount() { //reqd parameters
Dir_d = Namei(dirname)
aquire mnt_sem;
If(dir_d is not directory || dir_d is root )
exit;
Sb = Read_super();
Check_remount();
Add_vfsmnt(sb);
D_mount->dir_d = s_root;;
Mt_root->parent = dir_d;
Release mt_sem;

}
}

74
 Understanding the UNIX Kernel by Daniel P. Bovet & Marco Cesati

 http://www.atnf.csiro.au/people/rgooch/linux/docs/vfs.txt

 http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181
637/ch06lev1sec2.html

 http://sunsite.nus.sg/LDP/LDP/tlk/node102.html

 http://www.linux.it/~rubini/docs/vfs/vfs.html

75

Vous aimerez peut-être aussi