Vous êtes sur la page 1sur 6



stands for  
 . In Windows and many O/S user data is stored in files. In a text file, on
Windows or Linux, the data consists of lines/records one after the other. Such files are called  

"#$%"& VSAM overcomes some of the limitations of conventional
file systems like ISAM, and is hence a major boon.
c '  ((

In the early days of computers, all the data was stored in )& ) 

!. Data was
stored in the form of records, one after the other. Suppose, we wanted to store the information about all the employees in our
organization. Below, you¶d find a find picture of a how a *)  looks like :

As you can see, each record represents the data of a single, individual employee. This way, there would be thousands of
records that make )
c  #%&"&  +",( '"%

Well, that¶s the tough part. Coz sequential datasets work more or less similar to a Cassette Tape. Yup.. an audio cassette tape.
% *) 

 If you want to play
a particular song, you have to start from the beginning of the tape, travel through the entire tape, till you reach the desired song.
You can¶t directly jump to a song and play it. You have to read through the tape, and forward scan through it, till you reach the
desired place.

On the same lines, when you want to search for a particular record say Employee no. 502, you have to travel through the entire
the list of records, one by one, till you reach the desired record. The longer is the Sequential File, the longer is would take to
access the record. You just don¶t know, where the record lies hidden in such huge list/Dataset. So, searching or getting data
records, i.e.(&$ 

Let me give you another analogy : you can compare a sequence of records stored in a *)  to a
stack/collection books arranged in a cupboard in a Library. Suppose, &"*%",(  #
-%#&$,(%"+( You can apply
this analogy to records in a sequential file as well. . *)  
c .  ./ & -

You can use a more structured and organised way of storing this data called 
. Though the abbreviation is a little geeky
± VSAM, solves both our problems with Sequential files :
#". + 1% 

Apart from this, there are many other advantages that VSAM has to offer :
- Free space within a dataset is reclaimed automatically
1* %#$&
  1*  $#
&%#$"1* .  
c &%

VSAM Datasets are primarily of 3 types -
Ë &   !
2 2& 2  !
3 3333 !

4  #"&%&$2  4#  433 4 
c - &   ! 4&5%,

An Entry Sequenced Dataset is similar to a Sequential File/PS Dataset. Records are stored one after other in successively. As
you enter more records, the records are simply appended to the end, and the ESDS Dataset grows in size.

% I shall talk about ESDS further at length later in the article.


.1 .  c8 .
c -2& 2  ! 4&5%,
- 4%2& : In a KSDS, every record is identified by a unique identification (&  &#$%&
"(& This key could be his Employee Identification No, since it is unique for each
employee. "$%&$(& 

0'":When you first create a KSDS cluster, it is initially empty. You must fill data into the KSDS Dataset.
Thus, &%%9!2  " When data is being loaded into KSDS Dataset, the data
must be supplied in!(& 

2  !(& Let me cite a simple, yet practical example.
Suppose you have an empty VSAM KSDS Cluster. You¶ve written a Batch JOB/JCL to read input records of employees and
add them to the KSDS Dataset. You first add Employee EË with key 0Ë to the KSDS File. It works. Next, you add Employee
E2 with key 03 to the KSDS File. It works. Next, you try to add Employee E3 with key 02 to the KSDS File. The Insert is

- 2    : A 2  4contains two parts :

Ë 4$% ± That stores the file records(actual data
2 .54$%/That keeps track of the location of the record in the data component.

Given below is a rough sketch which will give you a helicopter view of what a KSDS Cluster looks like. Of course, the details
are explained further ahead.

$$04% $&
As shown in the figure above, in essence, 2  ;%/.  <4$%

4$% The Data
Component contains the Data records. Every record is stored in Ë Mainframe Storage/Memory Location. &$$&
= >(#"%%# $$$&#
**= ',&"&,&
 If you know the house address, you can access the house. The same way, our houses/storage locations in the memory
have unique addresses, by which they can be accessed. If you knew the location/address of a house you can easily access it(in
much less time .
0$$/4$%+(-.  <"2  .  <?'"%$$%"%
.  <4$% :
Now, imagine, if you didn¶t have an index in a book, and you wanted to find a keyword. You would have to read and scan the
entire length of the book, page by page, till you could find the word you¶ve wanted. The .  < simplifies this activity. 
5,($.  <4$%#"#&-. Basically, a book index
has two columns, one the keyword, and other the page no./location in the text where you¶d find this keyword. Every page has a
page-number. On the same lines, every record in Storage has an address. So, the INDEX Component of KSDS has an entry for
every keyword(key-field . For example, employees Ë,2,3 each have an entry in the index.
"$%&,4$%   #5
(&0(,(5%,&(&"! So, how does it
work? Let¶s say you wanted to find the name of Employee No. 4. Simple, you look up the the row of Employee 4 in the index.
¢his is easy, because, the index is already sorted on the Key field => Emp No. "#& 
9'!4$%#" $%&@  ABB  &("#
&"&C$%ABB#$ $%&3 (&

¢he GIS¢ of this concept is, Index Component stores the key-field, and pointers(address in memory) to the corresponding
record. ¢his way, Searching is faster and easier.

%,.  <(&03.5$%&,.  <! 

9$&#$,&".  <4$% .  <4$%
2  (($&#"%& .4$% #"#+D
 .&("#"-+D#".  <4$%&(# .&#

c -3333 ! 
A Relative Record Dataset is organised as a collection/list of records. Each record has Relative Record Number(RRN . The
RRN indicates the relative position of the record in the file. Think of people standing in a long queue to get tickets. RRDS
resembles this ; a person who is 5 places away from the first person in queue(he¶s 6th in line , his RRN will be 5. Note that, in
relative Record Datasets, there is NO KEY-FIELD. Instead, it is upto the user to draw a relationship between any UNIQUE

Once again, 33 $2  #2 E2&0! 3 413 33

$, .&("33 #&&33
! .&("33 #
c8 .
c '"


5,(4.4.! 4.4.! is the

4% for +9142in a *c
 The Control Interval(CI is the basic unit of data transfer
between Storage Device and machine. Hence, one should remember, that the basic unit of I/O ¢ransfer in VSAM is 4

.4.4.!#(& A VSAM Cluster could
have thousands of Control Intervals. It is important to ensure that, a record is written into the 4.#

#,&F9 6' 2  %
,".< 9
+9 9 .#$
.<  .7 41 319.  3

The default size of Control Interval(CI) is 4k= 4096 bytes. owever, the maximum size of a CI is 32k, whereas minimum is 512
When a new KSDS Cluster is created, Control Intervals are created and records are written into the Control Intervals.
9  .3  
3.' 3E . )9 
9 1 94. 

4.&/ $%!

Assume, that we have all records of SAME fixed size = Ë024 bytes. And each CI = 4096 bytes. Then, in this example,
No. of records in each CI = 4096/1024 = 4 records/CI

Once again, it is not necessary that all the records are of the same fixed size. Records could be of any length.

4.$$%&*%4$! :

Consider the same example as above -

CI Size = 4096 bytes, all records are of length = 1,270 bytes. In this case, no. of records that can be written to a Control
Interval CI = 3 records. Still some empty space is left over.

1 CI will contain 3 records. 3 x 1,270 = 3,810

¢herefore, 4,096 ± 3,810 = 286 bytes FREESPACE.

„    ! 


What happens, when you try to write the 4th record? 

Let us study this process of filling up of the Control Interval(CI once again in slow motion. The 4th record is Ë,270 bytes in
length. Control Interval does not have sufficient space. A new Control Interval(CI is created, and roughly one half of the
records are transferred from the old Control Interval(CI to the new Control Interval(CI . This is called 4. % 

4. %$,!:%4.#"4.

4.4.$3 ! :
Apart from logical data records and FREESPACE, the Control Interval CI also stores some 4.$ 4
.$ contains 2 pieces of information
- Control Interval Descriptor Field(CIDF 4 Bytes ± One CIDF per CI(Stores information about the amount and location of free
space in a CI)
- Record Descriptor Field(RDF 3 bytes ± Used to describe the length of the records.
For all same fixed length records, there are 2 RDFs => 1 for storing the length, 1 for storing how many number of records
with the same length.

%3 :
What happens if a record has length greater than the CI Size. For example, in the above scenario, if CI Size = 4096 and Record
length = 5,120 bytes. ¢hen, the record grows and extends into another CI. Such a record which occupies many Control
Intervals is called a %3#since its spans multiple Control Intervals.

Carrying forward the above example, the spanned record will completely fill up the first CI of size 4,096. The second CI will
be partially filled. Spanned record will occupy 5,Ë20 ± 4,096 = Ë,024 bytes of the second CI.

You might be tempted to think, that since the second CI has a lot of free/vacant space left, 4,096 ± Ë,024 = 3,072 bytes, it can
accomodate other records. However, this is not so.
$ $%&%&,5%#

! :

! When MVS O/S allocates space
to a VSAM Cluster, it does so in terms of Control Areas. So, 4

It is atomic, which means when MVS allocates storage space to VSAM, it will allocate space as Ë Control Area, 2 Control
Areas, 5 Control Areas,... and so on. You can¶t allocate Ë.5 Control Areas of space to a VSAM File.

! % :

Let us draw parallel on the lines of Control Interval. Let¶s say you have a record, which does not fit into a Control Interval. So.
you would try and make a new Control Interval(CI . A CI Split is likely to occur. But, to add to it, there¶s not enough space in
the Control Area, to create a new Control Interval. Now what happens? A new Control Area will be created and, half of the
control intervals are transferred from the original control Area to the new Control Area. This process is called a 4


¢hus, when there¶s not enough room inside a Control Area, for a CI, a new Control Area is created and half of the CI¶s are put
into it.
c '"2  4(( 4&"$%
Sure, this is how a KSDS Cluster looks like ± collection of control areas, each control area containing all same sized fixed
length blocks called Control Intervals, which contain the Logical Records.
c .  <4$%2  4((

The INDEX component of KSDS has 2 parts ± Index Set and  .   is the lowest level of the .  <
Component. It contains the Ë 2 E for each Control Area(CA , and physical location(address of that CI on the disk.

The INDEX Set of KSDS has one or more levels. Highest level contains only Ë record. And the "
.  < %  . Records at the top level point to the records at the next level, which in turn
point to the records at the next level and so on...

For every Control Area in the Data Component, there is one record in the Sequence Set. Inside each Control Area,
corresponding to the Control Interval, there is one index entry in the Sequence Set record.

.  < &GH'2&%4. 

Consider the following Structure :

For example,
In Control Area-Ë :
For Control Interval-Ë, Highest Key Value = Ë7
For Control Interval-2, Highest Key Value = 247
For Control Interval-3, Highest Key Value = 369
These will be the entries in the Sequence Set record corresponding to Control Area Ë.

The entries in the Index Set Record contain the highest key value of the Sequence Set records.

Suppose we wish to access record 50Ë.
Ë Lookup, the entry in the Index Set Record, then find the appropriate 4
2 Lookup the entry in the Sequence Set record to find the corresponding Control Interval.
It works like +&

It is important to discuss the implications of the F4.on

$. More the no. of CIs, more I/O
transfer operations are required.

If access is RANDOM ===> then CI Size ===> Reason : Within the CI,
should be small the record is searched sequentially.

If access is SEQUENTIAL==> then CI Size =====>No. of CIs =====> I/O Transfer

should be large will be less operations will be less