Vous êtes sur la page 1sur 42

Connecting with Computer

Science, 2e
Chapter 10
File Structures
Connecting with Computer Science, 2e 2
Objectives
In this chapter you will:
Learn what a file system does
Understand the FAT file system and its advantages
and disadvantages
Understand the NTFS file system and its advantages
and disadvantages
Compare common file systems
Learn how sequential and random file access work
See how hashing is used
Understand how hashing algorithms are created

Connecting with Computer Science, 2e 3
Why You Need to Know About...File
Structures
Knowledge of how an operating system stores and
maintains data in a computer
Allows better comprehension of how a computer
handles and manipulate files
Allows the computer to run as efficiently as possible
Connecting with Computer Science, 2e 4
What Does a File System Do?
Responsibilities
Creating, manipulating, renaming, copying, and
removing files to and from a storage device
Organizing files into common storage units
Called directories
Keeping track of file and directory locations
Assisting users
Relate files and folders to the physical structure of the
storage medium
Connecting with Computer Science, 2e 5
What Does a File System Do? (contd.)
Files used by operating systems and applications
Word-processing documents
Source code for programs you have written
Music files
Movie files
Spreadsheets
Photos
Operating systems use a file folder icon to represent
a directory
Connecting with Computer Science, 2e 6
Figure 10-1, Files and directories in a file system are
similar to documents and folders in a filing cabinet
What Does a File System Do? (contd.)
Connecting with Computer Science, 2e 7
Figure 10-2, Folders and files in Windows
What Does a File System Do? (contd.)
What Does a File System Do? (contd.)
Hard disk
Most common storage medium for a file system
Physically organized into tracks and sectors
Read/write heads move over specified areas of the
hard disk to store (write) or retrieve (read) data
Random access device
Reads or writes data directly on the disk
Faster than sequential access
Reads and writes from beginning to end
Makes use of the file system to organize files
Connecting with Computer Science, 2e 8
File Systems and Operating Systems
File management system
Dependent on the operating system
FAT (File Allocation Table)
Used from MS-DOS to Windows ME
NTFS (New Technology File System)
Default for Windows
Unix and Linux support several file systems
XFS, JFS, ReiserFS, ext3, others
Mac OS X file system
HFS and HFS+
Connecting with Computer Science, 2e 9
Connecting with Computer Science, 2e 10
FAT
Groups hard drive sectors into clusters
Increases performance by organizing blocks of
sectors contiguously
Maintains a relationship between files and clusters
Clusters have two entries in the FAT
Current cluster information
Link to next cluster or special code indicating the last
cluster
Keeps track of writable clusters and bad clusters
Connecting with Computer Science, 2e 11
Figure 10-3, Sectors are grouped into clusters on a hard disk
FAT (contd.)
Connecting with Computer Science, 2e 12
FAT (contd.)
Hard drive organization
Partition boot sector
Contains information on how to access volumes
Main and backup FAT
If error in reading the main FAT, backup copied to main
to ensure stability
Root directory
Contains entries for every file and folder in the directory
Data area
Measured in clusters
Connecting with Computer Science, 2e 13
Figure 10-4, Typical FAT file system
FAT (contd.)
Connecting with Computer Science, 2e 14
Disk Fragmentation
File clusters scattered in different locations on the
storage medium
Windows provides the Disk Defragmenter utility
Reorganizes clusters contiguously
Improves performance
Minimizes movement of the read/write heads
Use regularly to ensure system runs at peak
performance
Connecting with Computer Science, 2e 15
Figure 10-5, Files become fragmented as theyre stored in
noncontiguous clusters; a defragmenting utility moves files to
contiguous clusters and improves disk performance
Disk Fragmentation (contd.)
Connecting with Computer Science, 2e 16
Advantages of FAT
Efficient use of disk space
Does not have to use contiguous space for large files
File names up to 255 characters (FAT32)
Easy to recover deleted files upon deletion
System places E5h in the first position of filename
File remains on drive
Replace E5h with original first letter of the filename
Connecting with Computer Science, 2e 17
Disadvantages of FAT
Performance slows down as more files are stored on
the partition
Hard drive fragments easily
Lack of security
NTFS provides access rights to files and directories
File integrity problems
Lost clusters
Invalid files and directories
Allocation errors
Connecting with Computer Science, 2e 18
NTFS
Overcomes FAT system limitations
Journaling file system
Keeps track of transaction performed
Rolls back transactions if errors found
Uses a Master File Table (MFT)
Stores data about all files and directories
Similar to database table with records
Uses clusters
Reserves blocks of space to allow the MFT to grow
Connecting with Computer Science, 2e 19
Advantages of NTFS
File access is very fast and reliable
MFT allows system recovery from problems without
losing significant amounts of data
Security is greatly increased over FAT
File encryption with EFS (Encrypting File System)
File compression reduces file size
Saves disk space
Connecting with Computer Science, 2e 20
Disadvantages of NTFS
Large overhead
Not recommended for volumes less than 4 GB
Cannot access NTFS volumes from:
MS-DOS
Windows 95
Windows 98
Linux
Connecting with Computer Science, 2e 21
Comparing File Systems
Choosing correct file system
Operating system dependent
Rarely depends on hardware
NTFS: Windows XP or Vista
Supports drive sizes up to 16 TB (1600 GB)
FAT: Windows 9x
Older small hard drives, small removable devices
UNIX/Linux
Many file system choices
Connecting with Computer Science, 2e 22
Table 10-1, Fat16, FAT32, and NTFS compared
Comparing File Systems (contd.)
Connecting with Computer Science, 2e 23
Table 10-2, Some UNIX/Linux file systems
Comparing File Systems (contd.)
File Organization
Topics covered:
File characteristics
How files are stored on disks and other media

Connecting with Computer Science, 2e 24
Binary or Text
Text files
Consist of ASCII or Unicode characters
Typically read with word-processing programs or text
editors
Easy to view and modify
Binary files
Computer readable (not human readable)
Coded and numeric information
More compact than text files
Examples: executable programs, applications, sound
and image files
Connecting with Computer Science, 2e 25
Connecting with Computer Science, 2e 26
Sequential or Random Access
Sequential storage
Data accessed one chunk after the other in order
Random storage
Data accessed in any order
Also called direct or relative access
Connecting with Computer Science, 2e 27
Figure 10-6, Sequential versus random access
Sequential or Random Access (contd.)
Connecting with Computer Science, 2e 28
Sequential Access
Starts at the beginning and processes to the end of
the file
Writing process is very fast
New data added to the end of a file
Retrieving, inserting, deleting, modifying data
Very slow
Stores data in rows like a database record
Field delimiters or specific fixed sizes for each field
Connecting with Computer Science, 2e 29
Figure 10-7, A comma can be used as a field delimiter
Sequential Access (contd.)
Connecting with Computer Science, 2e 30
Figure 10-8, Data can also be in fixed-length format
Sequential Access (contd.)
Connecting with Computer Science, 2e 31
Random Access
Provides faster access to large amounts of data
Stores fixed-length records (relative records)
Ability to mathematically calculate the records
position on disk surface and go right to it
Ability to update records in place
May waste disk space
Partial record or no data
Works well when sequential record number can
easily identify records
Connecting with Computer Science, 2e 32
Figure 10-9, Record organization and file access
Random Access (contd.)
Connecting with Computer Science, 2e 33
Hashing
Used for accessing relative record files
Uses unique value called a hash key
Widely used in database management systems
Involves a hashing algorithm to generate hash keys
for each record
Combining hash keys establishes an index to rows or
records of information
Connecting with Computer Science, 2e 34
Why Hash?
Allows a key field number not suited for relative file
access to be converted into a relative record number
Example: phone numbers as keys in a customer
information table
Divide highest possible phone number by the expected
number of customers to get the hash key
9999999999 / 2000 (estimated number of customers) =
approximately 5,000,000
Phone number 7025551234 / 5,000,000 gives the
record number 1045
Connecting with Computer Science, 2e 35
Why Hash? (contd.)
Hashing may result in collisions
Same relative key is generated for more than one
original key value
One solution:
Expand algorithm to add the sum of the digits of the
phone number to the relative key
Sum of the digits in phone number 7025551234 is 34
Original key 1045 + 34 = 1079
Lessens collisions but does not eliminate them
Connecting with Computer Science, 2e 36
Dealing with Collisions
Best hashing algorithms have collisions
One solution: create overflow area
Records with duplicate record numbers are placed in
the overflow area at the end of the file
Record retrieval
Hash key is calculated, and record at calculation
position is retrieved
If the record at that location isnt the correct one, the
overflow area is searched sequentially

Connecting with Computer Science, 2e 37
Figure 10-10, An overflow area helps resolve collisions
Dealing with Collisions (contd.)
Connecting with Computer Science, 2e 38
Hashing and Computing
Efficient hashing algorithm
Important to companies producing database
management systems
Many different hashing algorithms are used in
computing
Encryption and decryption
Indexing
Many programming languages have specialized
libraries of built-in hashing routines
One Last Thought
Determining a computer systems worth
Often measured in terms of data stored on hard
drives
Data can be difficult to replace
Data storage dependent on file systems
Strong understanding of file systems allows more
data availability and protraction
Connecting with Computer Science, 2e 39
Summary
Hard drive
Random access device
Stores information in tracks and sectors
Accesses data through read/write heads
File system
Responsible for creating, manipulating, renaming,
copying, and removing files from a storage device
Windows uses either FAT or NTFS
Connecting with Computer Science, 2e 40
Connecting with Computer Science, 2e 41
Summary (contd.)
FAT keeps track of which files are using specific
clusters
Vulnerable to disk fragmentation
NTFS uses MFT to keep track of files and
directories
Used with Windows
NTFS advantages over FAT
Better reliability and security, journaling, file
encryption, and file compression
Connecting with Computer Science, 2e 42
Summary (contd.)
Linux can be used with many file systems
Files contain binary or text (ASCII) data
Data is usually stored and accessed either
sequentially or randomly (relative access)
Hashing
Common method for accessing a relative file
Collisions occur when the hash key is duplicated for
more than one relative record location

Vous aimerez peut-être aussi