Vous êtes sur la page 1sur 12

MANAGING DATA RESOURCES

ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT

Information is becoming as important a business resource as money, material, and people. Businesses are realizing
the competitive advantage they can gain over their competition through useful information, not just data.

An effective information system provides users with timely, accurate, and relevant information. This information is
stored in computer files. When the files are properly arranged and maintained, users can easily access and retrieve
the information they need.

FILE ORGANIZATION TERMS AND CONCEPTS

A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records,
files, and databases.

• Bit: A bit represents the smallest unit of data a computer can handle. Such as 0 and 1.
• Byte (character): A byte is a group of bits. A character may be – but is not necessarily – the same as a
byte. A character is a single letter, number, or special character such as ;, $, or %.
• Field: A field is a unit of data consisting of one or more characters. An example of a field is your name, your
address, or your Social Security number.
• Record: A record is a collection of related fields. An example of a record would be your name and address
and Social Security number.
• File: A file is a collection of related records. An example of a file is collected data on everyone employed in
the same department of a company, including all names, addresses, and Social Security numbers.
• Database: A database is a collection of related files. A company database might include files on all past
and current employees in all departments.

Figure: The data hierarchy

A computer system organizes data in a hierarchy that starts with the bit, which represent either a 0 or 1. Bits can be
grouped to form a byte to represents one character, number, or symbol. Bytes can be grouped form a field, and
related fields can be grouped to form a record. Related records can be collected to form a file, and related files can
be organized into a database.

Entity: A record describes an entity. An entity is person, place, thing, or event on which we maintain information. An
order is typical entity in a sales order file, which maintains information on a firm’s sales orders.

Attribute: Each characteristic or quality describing a particular entity is called an attribute. For example, order
number, order date, order amount, item number, and item quantity would each be an attribute of the entity order. The
specific values that these attributes can have can be found in the fields of the record describing the entity order.

Key Field: Every record in a file should contain at least one field that uniquely identifies that record so that the
record can be retrieved, updated, or sorted. This identifier field is called a key field. An example of a key field is the
order number for the order record. E n t i t y = O R D E R
A t t r i b u t e s

O r d e r O r d e r I t e m
Q u a n t i t y A m o u n t
n u m b e r d a t e n u m b e r

4 3 4 0 0 2 / 0 8 / 9 9 1 5 8 3 2 1 7 . 4 0 f i e l d s

k e y f i e l d

-1-
ACCESSING RECORDS FROM COMPUTER FILES

Computer systems store files on secondary storage devices. Records can be arranged in several ways on storage
media, and the arrangement determines the manner in which individual records can be accessed or retrieved. One
way to organize records is sequentially. In Sequential file organization, data records must be retrieved in the same
physical sequence in which they are stored. In contrast, Direct or random file organization, allows user to access
records in any sequence the desire, without regard to their actual physical order on the storage media.

When we were describing secondary storage, we talked about magnetic tape and disk storage for computer data. To
understand how information is accessed from these mediums, think about the difference between a music cassette
tape and a music CD. If you want to get to a particular song on a cassette tape, you must pass by all the other songs
sequentially. If you want to get to a song on CD, you can go directly to that song without worrying about any of the
others. That is the difference between sequential and direct access organization for database records.

The Indexed Sequential Access Method

The indexed sequential access method is such a file access method that directly access records organized
sequentially using an index of key fields. An index is a table or list that relates record keys to physical locations on
direct access files.

To explain the indexed sequential access method, let's go back to the example of the cassette tape. A cassette
tape label has a printed list of the songs contained on it which gives you a general idea of where to go on the tape to
find a particular tune. So too with computer records on a sequential access tape using the key field. It gives the
computer a pretty accurate idea of where a particular record is located. That's why it's so a important to have a
unique ID as the key field. You and your customer could have a difficult time if the key field is duplicated among
several records. Each key field and the ultimate location of that record on the storage device is maintained in the
index.
C y l i n d e r I n d e x

R e c o r d
2 3 0 | D a t a C y l i n d e rH i g h e s t K e y
K e y 1 2 0 0
f i e l d 2 3 9 2
3 5 8 8
. .
. .
. .
. .
T r a c k I n d e x T r a c k I n d e x
f o r C y l i n d e r 1 T r a c k I n d e x f o r C y l i n d e r 2 f o r C y l i n d e r 3

C y l i n d e rH i g h e s t K e y C y l i n d e rH i g h e s t K e y C y l i n d e rH i g h e s t K e y
1 2 0 0 1 2 0 0 1 2 0 0
2 3 9 2 2 3 9 2 2 3 9 2
3 5 8 8 3 5 8 8 3 5 8 8
. . . . . .
. . . . . .
. . . . . .
. . . . . .

Direct File Access Method

The direct file access method is used with direct file organization. This method employs a key field to locate the
physical address of a record. However, the process is accomplished using a mathematical formula called a transform
algorithm to translate the key field directly into the record’s physical storage location on disk. The algorithm performs
some mathematical computation on the record key, and the result of that calculation is the record’s physical address.
An example might be an on-line hotel reservation system.

Figure: The direct file access method

-2-
PROBLEMS WITH THE TRADITIONAL FILE ENVIRONMENT

The traditional file environment is a way of collecting and maintaining data in an organization that leads to each
functional area or division creating and maintaining its own data files and programs.

Many problems, such as data redundancy, program-data dependence, inflexibility, poor data security, and inability to
share data among applications, have occurred with traditional file environments.

Data Redundancy and Confusion

Data redundancy is the presence of duplicate data in multiple data files. Data redundancy occurs when different
divisions, functional areas, and groups in an organization independently collect the same piece of information. For
instance, within the commercial loans division of a bank, the marketing and credit information functions might collect
the same customer information. Because it is collected and maintained in so many different places, the same data
item may have different meanings in different parts of the organization.

Program-Data Dependence

The close relationship between data stored in files and the software programs that update and maintain those files.
Any change in data organization or format requires a change in all the programs associated with those files. In many
instances, applications work with outdated data simply because of the difficulty of making updates.

Lack of Flexibility

A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot
deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion. The information
required by ad hoc requests is “somewhere in the system” but is too expensive to retrieve. Several programmers
would have to work for weeks to put together the required data items in a new file. Users – in particular, senior
management – begin to wonder at this point why they have computers at all.

Poor Security

Because there is little control or management of data, access to and dissemination of result are virtually out of
control. What limits on access exist tend to be the result of habit and tradition, as well as of the sheer difficulty of
finding information.

Lack of Data Sharing and Availability

The lack of control over access to data in this confused environment does not make it easy for people to obtain
information. Because pieces of information in different files and different parts of the organization cannot be related
tone another, it is virtually impossible for information to be shared or accessed in a timely manner.

-3-
A MODERN DATABASE ENVIRONMENT

The key to establishing an effective, efficient database is to involve the entire organization as much as possible,
even if everyone seemingly will not be connected to it or be a user of it. Perhaps they won't be a part of it in the
beginning, but they very well could be later on.

DATABASE MANAGEMENT SYSTEMS (DBMS)

A Database Management System is basically another software program like Word or Excel or Email. This type of
software is more complicated: it permits an organization to centralize data, manage them efficiently, and provide
access to the stored data by application programs.

I N T E G R A T E D
H U M A N R E S O U R C E S
D A T A B A S E

E m p l o y e e s :
N a m e
A d d r e s s
S o c i a l S e c u r i t y N u m b e r
P o s i t i o n P e r s o n a l
M a r i t a l S t a t u s A p p l i c a t i o n P e r s o n n e l
P r o g r a m s D e p a r t m e n t

P a y r o l l :
H o u r s w o r k e d P a y r o l l
D a t a b a s e P a y r o l l
P a y r a t e A p p l i c a t i o n
G r o s s P a y M a n a g e m e n t S y s t e m D e p a r t m e n t
P r o g r a m s
F e d t a x
S t a t e T a x
N e t P a y B e n e f i t s
A p p l i c a t i o n B e n e f i t s
P r o g r a m s D e p a r t m e n t
B e n e f i t s :
L i f e I n s u r a n c e
P e n s i o n P l a n s
H e a l t h C a r e P l a n
R e t i r e m e n t b e n e f i t
Figure: The contemporary database environment

A single Human Resources Database serves multiple applications and also allows a corporation easily to draw
together all of the information on various applications. The database management system acts as the interface
between the application programs and the data.

A DBMS has 3 components, all of them important for the long-term success of the system.
 Data definition language
 Data manipulation language
 Data dictionary

Data Definition Language

The data definition language is the formal language used by programmers to specify the content and structure of the
database. The data definition language defines each data element as it appears in the database before that data
element is translated into the forms required by application programs.

Data Manipulation Language

This is a formal language used by programmers to manipulate the data in the database and make sure they are
formulated into useful information. The goal of this language should be to make it easy for users. The basic idea is to
establish a single data element that can serve multiple users in different departments depending on the situation.
Otherwise, you'll be tying up programmers to get information from the database that users should be able to get on
their own.

Data Dictionary

Each data element or field should be carefully analyzed to determine what it will be used for, who will be the primary
user, and how it fits into the overall scheme of things. Then write it all down and make it easily available to all users.
This is one of the most important steps in creating a good database.

-4-
Figure: Sample data dictionary report.

Figure shows a properly constructed data dictionary report. We can see exactly who owns the data element and all
the business functions that use the data element. It also lists the people who have access to the data element.

LOGICAL AND PHYSICAL VIEWS OF DATA

Perhaps the greatest different between a DBMS and traditional file organization is that the DBMS separates the
logical and physical views of the data, relieving the programmer or end user from the task of understanding where
and how the data are actually stored.

The database concept distinguishes between logical and physical views of data.

Logical View: Representation of data as they would appear to an application programmer or end user.

Physical View: The representation of data as they would be actually organized on physical storage media.

The logical description of the entire database, listing all of the data items and the relationship among them, is termed
the schema. The specific set of data from the database that is required by each application program is termed the
subschema. For example, for the Human Resources Database illustrated in the previous figure, the payroll
application would have a subschema consisting of employee name, address, social security number, and would
specify payroll data such as a pay rate and hours worked.

-5-
TYPES OF DATABASE ORGANIZATION

There are alternative ways of organizing data and representing relationships among data in a database.
Conventional DBMS use one of three principal logical database models for keeping track of entities, attributes, and
relationships. The three principal logical database models are hierarchical, network, and relational. Each logical
model has certain processing advantages and certain business advantages.

HIERARCHICAL DBMS

In a hierarchical database, files or records are arranged in related groups resembling a family tree, with “child”
records subordinate to “parent” records. A parent may have more than one child, but a child always has only one
parent. To find a particular record, you have to start at the top with a parent and trace down the chart to the child.

Hierarchical DBMSs work well when the data elements have an intrinsic one-to-many relationship, as might happen
with a reservation system. The difficulty, however, is that the structure must be defined in advance and is quite rigid.
There may be only one parent per child and no relationships among the child records.

One type of logical database model that organizes data in a treelike structure. A record is subdivided into segments
that are connected to each other in one-to-many parent-child relationships.

Figure: A hierarchical database for a human resources system

Figure shows a hierarchical structure that might be used for a human resources database. The root segment is
“Employee,” which contains basic employee information such as name, address, and identification number.
Immediately below it are three child segments: Compensation (containing salary and promotion data), Job
Assignment (containing data about job positions and departments), and Benefits (containing data about beneficiaries
and various benefit options). The Compensation segment has two children below it: Performance Ratings (containing
data about employee’s job performance evaluations) and Salary History (containing historical data about employee’s
past salaries). Below the Benefits segment are child segments for Pension, Life Insurance, and Health Care,
containing data about these various benefit plans.

NETWORK DATABASE MODEL

A network database is similar to a hierarchical DBMS, but each child record can have more than one parent record.
Thus, a child record may be reached through more than one parent.

This arrangement is more flexible than the hierarchical one. However, it still requires that the structure be defined in
advance. Moreover, there are limits to the number of links that can be made among records.

The network data model is a variation of the hierarchical data model. Indeed, databases can be translated from
hierarchical to network and vice versa in order to optimize processing speed and convenience. Whereas hierarchical
structures depict one-to-many relationships, network structures depict data logically as many-to-many relationships.
In other words, parents can have multiple “children” and a child can have more than one parent.

Figure: The network data model

This illustration of a network data model showing the relationship the students in a university have to the courses
they take represents and example of logical many-to-many relationships.

-6-
RELATIONAL DATA MODEL

The relational data model, the most recent of these three database models, overcomes some of the limitations of the
other two models. The relational model represents all data in the database as simple two-dimensional tables called
relations. The tables appear similar to flat files, but the information in more than one file can be easily extracted and
combined. Sometimes the tables are referred to as files.

In a relational database, each table contains a primary key, a unique identifier for each record. To make sure the
tables relate to each other, the primary key from one table is stored in a related table as a secondary key. For
instance, in the Customer table the primary key is the unique Customer ID. That primary key is then stored in the
Order Table as the secondary key so that the two tables have a direct relationship.

Customer Table Order Table


Field Name Description Order Number Primary Key
Customer Name Self Explanatory Order Item Self Explanatory
Customer Address Self Explanatory Number of Items Ordered Self Explanatory
Customer ID Primary Key-----> Customer ID Secondary Key
Order Number Secondary Key

 Select: create a subset of records meeting the stated criteria


 Join: combine related tables to provide more information than individual tables
 Project: create a new table from subsets of previous tables

The biggest problem with these databases is the misconception that every data element should be stored in the
same table. In fact, each data element should be analyzed in relation to other data elements with the goal of making
the tables as small in size as possible. The ideal relational database will have many small tables, not one big one.
On the surface that may seem like extra work and effort, but by keeping the tables small, they can serve a wider
audience because they are more flexible. This setup is especially helpful in reducing redundancy and increasing the
usefulness of data.

COMPARISON OF DATABASE ALTERNATIVES

Hierarchical and network databases can be very efficient as long as we plan ahead. But as we know, needs change,
and neither one of these databases offers a lot of flexibility to change with business needs. It's sort of like parents
and children; once you establish the tie, it's pretty hard to amend.

Relational database management systems are more flexible, especially if you keep the tables small. It is much easier
for non-techies to create the query language in a relational system. It's also easier to add new data elements,
although if we do, we'll have to go back and fill in the missing information for the old records or just forget them
altogether.

Table: Comparison of Database Alternatives

Table compares these alternatives on several dimensions to show we the advantages and disadvantages of each.

What we should remember is that none of these databases is very good if we don't keep the end user in mind. If
we're not careful, we'll wind up with lots of information that no one can use.

-7-
DESIGNING A DATABASE

Database design is done to meet both logical and physical organizational needs. Logical design refers to what the
database is (what is in it); physical design refers to how the data is organized and stored, and what storage hardware
is used.

Matching the Design to the Organization

Most users working with microcomputer-based DBMS software focus on a very specific set of objectives and
information processing needs-their own. For many small applications this is a satisfactory approach. However, the
objectives must be broader when working with a large corporation, a large computer system, and more complex and
sophisticated DBMS software. And the plans for the use of the database management system must be integrated
with the long-range plans for the company’s total information system. The information processing needs of the entire
corporation must be considered to build a corporate database that facilitat6es collecting, maintaining, and sharing
data among all organizational units. Once the general information needs have been established, the design process
can proceed. This process usually comprises two distinct phases of activity – the logical design phase and the
physical design phase.

Logical Design

Logical database design refers to what the database is, not to how it operates; in other words, the logical design is a
detailed description of the database model from the users’ perspective rather than the technical perspective. The
logical design of a database involves defining user information needs, analyzing data element requirements and
logical groupings, considering input and output methods, finalizing the design, and creating the data dictionary. The
focus is on identifying every element of data necessary to produce the required information systems reports and on
the relationship among the records.

Physical Design

Once the logical design of the database is ready, the next step is physical design. The physical database design
specifies exactly how the data will be arranged and stored on the direct access storage devices allocated for DBMS
use. The objective of the physical design is to store data so that it can be updated and retrieved as quickly and
efficiently as possible. DBMS users are not involved in the physical design of the DBMS, since that is determined by
the type of DBMS software they have purchased for their microcomputer or that is running on a large computer
system.

-8-
DATABASE ADMINISTRATION

Database Administration

The effective use of a database within an organization requires a great deal of cooperation and coordination. User
requirements and needs throughout an organization need to be frequently reviewed, and the overall security and
integrity of the database must be ensured. Organizations working with DBMS quickly recognized the need for a
database administrator (DBA), an individual or a group of individuals to coordinate all related database activities and
to control the database.

Why Administrator?

As we mentioned earlier, developing and implementing a corporate-wide database is a major task that requires
management’s complete support, substantial time from designers and users, and often large sums of money. This
task needs to be coordinated. In addition, the data in the database often represents the company’s most precious
resource: It must be managed well, so that it is not misused or damaged.

MANAGEMENT REQUIREMENTS FOR DATABASE SYSTEMS

Much more is required for the development of database systems than simply selecting a logical database model.
Indeed, this selection may be among the last decisions. The database is an organizational discipline, a method,
rather than a tool or technology. It requires organizational and conceptual change.

Without management support and understanding, database efforts fail. The critical elements in a database
environment are:

1. Data administration
2. Data planning and modeling methodology
3. Database technology and management, and
4. Users

This environment is depicted in the following figure and will now be described.

Figure: Key organizational elements in the database environment


Data Administration

A special organizational function for managing the organization’s data resources, concerned with information policy,
data planning, maintenance of data dictionaries, and data quality standards.

Although data administration is a very important organizational function, it has proven very challenging to implement,
as described in the Window on Management.

Data Planning and Modeling Methodology

Because the organizational interests served by the DBMS are much broader than those in the traditional file
environment, the organization requires enterprise-wide planning for data. Enterprise analysis, which addresses the
information requirements of the entire organization, is needed to develop databases. The purpose of enterprise
analysis is to identify the key entities, attributes, and relationships that constitute the organization’s data.

-9-
Database Technology and Management

Databases require new software and a new staff specially trained in DBMS techniques as well as new management
structures. Most corporations develop a database design and management group within the corporate information
system division that is responsible for the ore technical and operational aspects of managing data. The functions it
performs are called database administration. This group does the following:
 Define and organize database structure and content
 Develops security procedures to safeguard the database
 Develops database documentation
 Maintains the database management software

In close cooperation with users, the design group establishes the physical database, the logical relations among
elements, and the access rules and procedures.

Users

A database serves a wider community of users than traditional systems. Relational systems with fourth-generation
query languages permit employees who are not computer specialists to access large databases. In addition, users
include trained computer specialists. In order to optimize access for nonspecialists, more resources must be devoted
to training end users. Professional systems workers must be retrained in the DBMS language, DBMS application
development procedures, and new software practices.

THE JOB OF THE DATABASE ADMINISTRATOR (DBA)

The responsibilities for administering the database activities within an organization are usually assigned to an
individual or a small group, depending on the size of the organization and the scope and complexity of the database.
The database administrator has six major responsibilities.

1. Database design: The DBA plays a key role in both the logical and the physical design phases. He or she
guides the definition of the database content and the creation of the data dictionary, as well as setting data
classification and coding procedures and backup and restart/recovery procedures.

2. Database implementation and operation: The DBA guides the use of the DBMS on a daily basis. Among
other things, this includes adding and deleting data, controlling access to data, detecting and repairing
losses, instituting restart/recovery procedures when necessary, and assigning space used on secondary
storage devices.

3. Coordination with users: The DBA receives and reviews user requests for additional DBMS support that
have been forwarded by programming analysts. The administrator establishes feasibility, resolves
redundant or conflicting request, and assists in the process of establishing priorities for the requests. In
addition, the DBA is responsible for establishing and enforcing organization-wide DBMS standards for such
things as techniques for accessing data, formats in which data elements will be stored, and data element
names.

4. Backup and recovery: The DBA is responsible for preparing a plan to periodically back up the database(s)
and for establishing procedures for recovery fro the failure of the DBMS software or related hardware
components.

5. Performance monitoring: The DBA constantly monitors the performance of the DBMS using specialized
software to calculate and record operating statistics. If a problem occurs, such as a slowdown in
responsiveness, the DBA must identify the problem and take steps to improve the performance.\

6. System security: The DBA is responsible for designing and implementing a system that controls users’
access to the database files and determines which DBMS operations can be performed, as well as which
applications programs can be accessed. This system often involves the assignment of user identification
codes and passwords.

Organizations with a well-organized and well-staffed database administration department can be much more
successful with their database management systems than organizations without such a department.

- 10 -
ADVANTAGES AND DISADVANTAGES OF THE DBMS

A DBMS can minimize data redundancy, allow easy file updating, maximize data integrity and independence,
simplify maintenance, increase user productivity and data security, and standardize data definitions. However,
DBMSs require complex planning and expertise to create and maintain. They can also be expensive.

Advantages of the DBMS

 Minimization of data redundancy: More storage becomes available when maintenance of redundant
data elements among traditionally separate application files is minimized.

 Easy file updating and maximization of data integrity: In traditional systems in which the same
element of data was kept in several different files, ensuring that all copies of the data element were updated
when changes were made was a problem. When a data field needed to be changed, it had to he updated in
all the files in which it occurred. If some files were missed, data became inconsistent. When data is
inconsistent, data integrity is not maintained, and reports will be produced with erroneous information.

Data is no longer accurate, reliable, and/or timely. DBMSs make updating files much easier and so improve
the consistency of data, thus ensuring data integrity.

 Data independence and simplification of program maintenance: In a DBMS the programs are
much more independent of the data than in a traditional file processing systems. Previously, programs had
to include a substantial amount of information about the format and structure of the fields and records in
each file accessed. In a DBMS, this information is contained in the data dictionary.

 Increased user productivity: The ability of a DBMS to respond quickly to user requests for additional
information without involving the user in technical language manipulation encourages faster and more
efficient work. The report generators and query languages associated with database management systems
make them easy to use.

 Increased security: Control of access to and use of the database is easily established. With traditional
file processing systems, the data was too fragmented for effective security to be exercised.

 Standardization of data definitions: Before database management systems, each application


program could define similar elements of data with different names. However, the use of data dictionaries
standardizes the names and descriptions of data elements.

- 11 -
Disadvantages of the DBMS

In spite of the advantages of using a DBMS, there are circumstances in which such a system may involve
unnecessary overhead costs, as that would not be incurred in traditional file processing. The overhead costs of using
a DBMS are due to the following:

 High initial investment in hardware, software, and training.

 Overhead for providing security, concurrency control, recovery, and integrity functions.

 Other problems may arise if the database designers and DBA do not design the database properly or if the
database systems applications are not properly implemented. Hence, it may be more favorable to use
regular files under the following circumstances.

 The database and applications are simple, well defined, and not expected to change.

 There are stringent real-time requirements for some programs that may not be met because of DBMS
overhead.

 It is not required to access data by multiple-users.

WHO OWNS THE DATABASE

Here we need to say a few words about who owns a given type of database.

Individual Database: The individual database is basically a microcomputer database used by one person. The data
is usually stored on a large-capacity hard disk. A sales representative, for example, who is on the road a lot, may
build and maintain an individual database of customer and sales information.

Company Database: The company database (shared database) is shared by the users of one company in one
location. Company databases can be found in local area networks. The company owns the database. The data is
usually stored on a microcomputer and managed by a database administrator. Users are linked to the database
through terminals or microcomputer workstations.

Distributed Database: The distributed database is shared by the users of one company, which owns the database,
but the data is stored in several locations linked by a variety of communications networks.

Proprietary Database: The proprietary database is a huge database that functions as an information service, such
as CompuServe, Prodigy and Dow Jones New/Retrieval.

The proprietor owns the database in this case. To access this type of proprietary database, the user needs a modem
to hook the computer up to the service via the phone line. Another type of proprietary database can be “licensed”
(purchased to sue for a specified period of time) by computer users. For example, ABI’s Business Lists-on-Disc
(LCD) contains data on 9.2 million American businesses. It allows users to search for records by company name,
type of business, company size, geographic area, and use of Yellow Pages ads in phone books.

REVIEW QUESTIONS

1. Why is file management important for overall system performance?


2. Describe how indexes and key fields enable a program to access specific records in a file.
3. Define and describe the indexed sequential access method and the direct file access method.
4. List and describe some of the problems of traditional file environment.
5. Define a database and a database management system.
6. Name and briefly describe the three components of a DBMS.
7. What is the difference between a logical and a physical view of data?

8. List some of the benefits of DBMS.


9. List some of the limitations of DBMS.
10. Describe the three principal database models and the advantages and disadvantages of each.
11. Describe and briefly comment on the major management challenges in building a database environment.

- 12 -

Vous aimerez peut-être aussi