Vous êtes sur la page 1sur 10

Understanding Database Terminology A computer cannot process data unless it is organized in special ways; into characters, fields, records,

files and databases. After reading this lesson, you should be able to:

Define the key terms needed to understand what a database is and how it is used. Identify the purpose and role of characters in data processing. Identify the purpose and role of fields in data processing. Identify the purpose and role of records in data processing. Identify the purpose and role of database files in data processing. Identify the purpose and role of databases in data processing. Identify the purpose and role of data management systems in data processing. Identify the purpose and role of keys in data processing.

Character A character is the most basic element of data that can be observed and manipulated. Behind it are the invisible data elements we call bits and bytes, referring to physical storage elements used by the computer hardware. A character is a single symbol such as a digit, letter, or other special character (e.g., $, #, and ?). Field

A field contains an item of data; that is, a character, or group of characters that are related. For instance, a grouping of related text characters such as "John Smith" makes up a name in the name field. Let's look at another example. Suppose a political action group advocating gun control in Pennsylvania is compiling the names and addresses of potential supporters for their new mailing list. For each person, they must identify the name, address, city, state, zip code and telephone number. A field would be established for each type of information in the list. The name field would contain all of the letters of the first and last name. The zip code field would hold all of the digits of a person's zip code, and so on. In summary, a field may contain an attribute (e.g., employee salary) or the name of an entity (e.g., person, place, or event).

Record

A record is composed of a group of related fields. As another way of saying it, a record contains a collection of attributes related to an entity such as a person or product. Looking at the list of potential gun control supporters, the name, address, zip code and telephone number of a single individual would constitute a record. A payroll record would contain the name, address, social security number, and title of each employee. Database File

As we move up the ladder, a database file is defined as a collection of related records. A database file is sometimes called a table. A file may be composed of a complete list of individuals on a mailing list, including their addresses and telephone numbers. Files are frequently categorized by the purpose or application for which they are intended. Some common examples include mailing lists, quality control files, inventory files, or document files. Files may also be classified by the degree of permanence they have. Transition files are only temporary, while master files are much more long-lived. Database

Organizations and individuals use databases to bring independent sources of data together and store them electronically. Thus, a database is composed of related files that are consolidated, organized and stored together. One collection of related files might pertain to employee information. Another collection of related files might contain sports statistics. Organizations and individuals may have and use many different databases, depending on the nature of the work involved. For example, a library database might consist of several related, but separate, databases including book titles and author names, book description, books on order, books checked out, and similar sets of information. Most organizations have product information databases, customer databases, and human resource databases that contain information about employees, salaries, home address, stock purchase plans, and tax deduction information. In each case, the data stored in a database is independent from the application programs which use and process the data. Data Management System Data management systems are used to access and manipulate data in a database. A database management system is a software package that enables users to edit, link, and update files as needs dictate. Database management systems will be discussed in greater detail in another lesson. Key

In order to track and analyze data effectively, each record requires a unique identifier or what is called a key. The key must be completely unique to a particular record just as each individual has a unique social security number assigned to them. In fact, social security numbers are often used as keys in large databases. You might think that the name field would be a good choice for a key in a mailing list. However, this would not be a good choice because some people might have the same name. A key must be identified or assigned to each record for computerized information processing to function correctly. An existing field may be used if the entries are entirely unique, such as a social security or telephone number. In most cases, a new field will be developed to hold a key, such as a customer number or product number. Database Management System (DBMS)

DBMSs are the technology tools that directly support managing organizational data. With a DBMS you can create a database including its logical structure and constraints, you can manipulate the data and information it contains, or you can directly create a simple database application or reporting tool. Human administrators, through a user interface, perform certain tasks with the tool such as creating a database, converting an existing database, or archiving a large and growing database. Business applications, which perform the higher level tasks of managing business processes, interact with end users and other applications and, to store and manage data, rely on and directly operate their own underlying database through a standard programming interface like ODBC. The following diagram illustrates the five components of a DBMS.

Database Engine: The Database Engine is the core service for storing, processing, and securing data. The Database Engine provides controlled access and rapid transaction processing to meet the requirements of the most demanding data consuming applications within your enterprise. Use the Database Engine to create relational databases for online transaction processing or online analytical processing data. This includes creating tables for storing data, and database objects such as indexes, views, and stored procedures for viewing, managing, and securing data. You can use SQL Server Management Studio to manage the database objects, and SQL Server Profiler for capturing server events.

Data dictionary:

A data dictionary is a reserved space within a database which is used to store information about the database itself. A data dictionary is a set of table and views which can only be read and never altered. Most data dictionaries contain different information about the data used in the enterprise. In terms of the database representation of the data, the data table defines all schema objects including views, tables, clusters, indexes, sequences, synonyms, procedures, packages, functions, triggers and many more. This will ensure that all these things follow one standard defined in the dictionary. The data dictionary also defines how much space has been allocated for and / or currently in used by all the schema objects. A data dictionary is used when finding information about users, objects, schema and storage structures. Every time a data definition language (DDL) statement is issued, the data dictionary becomes modified. A data dictionary may contain information such as:

Database design information Stored SQL procedures User permissions User statistics Database process information Database growth statistics Database performance statistics

Query Processor: A relational database consists of many parts, but at its heart are two major components: the storage engine and the query processor. The storage engine writes data to and reads data from the disk. It manages records, controls concurrency, and maintains log files.The query processor accepts SQL syntax, selects a plan for executing the syntax, and then executes the chosen plan. The user or program interacts with the query processor, and the query processor in turn interacts with the storage engine. The query processor isolates the user from the details of execution: The user specifies the result, and the query processor determines how this result is obtained. The query processor components include

DDL interpreter DML compiler Query evaluation engine

Report writer: Also called a report generator, a program, usually part of a database management system, that extracts information from one or more files and presents the information in a specified format. Most report writers allow you to select records that meet certain conditions and to display selected fields in rows and columns. You can also format data into pie charts, bar charts, and other diagrams. Once you have created a format for a report, you can save the format specifications in a file and continue reusing it for new data.

Lesson 5: Types of Database Management Systems DBMSs come in many shapes and sizes. For a few hundred dollars, you can purchase a DBMS for your desktop computer. For larger computer systems, much more expensive DBMSs are required. Many mainframe-based DBMSs are leased by organizations. DBMSs of this scale are highly sophisticated and would be extremely expensive to develop from scratch. Therefore, it is cheaper for an organization to lease such a DBMS program than to develop it. Since there are a variety of DBMSs available, you should know some of the basic features, as well as strengths and weaknesses, of the major types. After reading this lesson, you should be able to:

Compare and contrast the structure of different database management systems. Define hierarchical databases. Define network databases. Define relational databases. Define object-oriented databases.

Types of DBMS: Hierarchical Databases There are four structural types of database management systems: hierarchical, network, relational, and object-oriented.

Hierarchical Databases (DBMS), commonly used on mainframe computers, have been around for a long time. It is one of the oldest methods of organizing and storing data, and it is still used by some organizations for making travel reservations. A hierarchical database is organized in pyramid fashion, like the branches of a tree extending downwards. Related fields or records are grouped together so that there are higher-level records and lower-level records, just like the parents in a family tree sit above the subordinated children.

Based on this analogy, the parent record at the top of the pyramid is called the root record. A child record always has only one parent record to which it is linked, just like in a normal family tree. In contrast, a parent record may have more than one child record linked to it. Hierarchical databases work by moving from the top down. A record search is conducted by starting at the top of the pyramid and working down through the tree from parent to child until the appropriate child record is found. Furthermore, each child can also be a parent with children underneath it. The advantage of hierarchical databases is that they can be accessed and updated rapidly because the tree-like structure and the relationships between records are defined in advance. However, this feature is a two-edged sword. The disadvantage of this type of database structure is that each child in the tree may have only one parent, and relationships or linkages between children are not permitted, even if they make sense from a logical standpoint. Hierarchical databases are so rigid in their design that adding a new field or record requires that the entire database be redefined. Types of DBMS: Network Databases

Network databases are similar to hierarchical databases by also having a hierarchical structure. There are a few key differences, however. Instead of looking like an upside-down tree, a network database looks more like a cobweb or interconnected network of records. In network databases, children are called members and parents are called owners. The most important difference is that each child or member can have more than one parent (or owner). Like hierarchical databases, network databases are principally used on mainframe computers. Since more connections can be made between different types of data, network databases are considered more flexible. However, two limitations must be considered when using this kind of database. Similar to hierarchical databases, network databases must be defined in advance. There is also a limit to the number of connections that can be made between records. Types of DBMS: Relational Databases

In relational databases, the relationship between data files is relational, not hierarchical. Hierarchical and network databases require the user to pass down through a hierarchy in order to access needed data. Relational databases connect data in different files by using common data elements or a key field. Data in relational databases is stored in different tables, each having a key field that uniquely identifies each row. Relational databases are more flexible than either the hierarchical or network database structures. In relational databases, tables or files filled with data are called relations, tuples designates a row or record, and columns are referred to as attributes or fields. Relational databases work on the principle that each table has a key field that uniquely identifies each row, and that these key fields can be used to connect one table of data to another. Thus, one table might have a row consisting of a customer account number as the key field along with address and telephone number. The customer account number in this table could be linked to another table of data that also includes customer account number (a key field), but in this case, contains information about product returns, including an item number (another key field). This key field can be linked to another table that contains item numbers and other product information such as production location, color, quality control person, and other data. Therefore, using this database, customer information can be linked to specific product information. The relational database has become quite popular for two major reasons. First, relational databases can be used with little or no training. Second, database entries can be modified without redefining the entire structure. The downside of using a relational database is that searching for data can take more time than if other methods are used. Lesson 8: Data Mining, Data Warehousing, and Data Marts Over the years, many large organizations have accumulated massive amounts of data about their customers, suppliers, products, and services. Even many new Web-based companies have amassed large databases about people and products as they have grown. The WWW is itself a large distributed data repository with untold potential. With the growing realization that these

vast data resources can be tapped for significant commercial gain, interest in data mining, data warehousing, and data marts has virtually exploded. After reading this lesson, you should be able to:

Compare data mining, data warehousing, and data marts. Describe the purpose and value of data mining. Describe the purpose and value of data warehousing. Describe the purpose and value of data marts.

Data Mining (DM) Data mining, also known as "knowledge discovery," refers to computer-assisted tools and techniques for sifting through and analyzing these vast data stores in order to find trends, patterns, and correlations that can guide decision making and increase understanding. Data mining covers a wide variety of uses, from analyzing customer purchases to discovering galaxies. In essence, data mining is the equivalent of finding gold nuggets in a mountain of data. The monumental task of finding hidden gold depends heavily upon the power of computers. Applications of Data Mining Data mining includes a variety of interesting applications. A few examples are listed below:

By recording the activity of shoppers in an online store, such as Amazon.com, over time, retailers can use knowledge of these patterns to improve the placement of items in the layout of a mail-order catalog page or Web page. Telephone companies mine customer billing data to identify customers who spend considerably more than average on their monthly phone bill. The company can then target these customers to sell additional services. Marketers can effectively target the wants and needs of specific consumer groups by analyzing data about customer preferences and buying patterns. Hospitals use data mining to identify groups of people whose healthcare costs are likely to increase in the near future so that preventative steps can be taken.

Data Mining Summarized In summary, the purpose of DM is to analyze and understand past trends and predict future trends. By predicting future trends, business organizations can better position their products and services for financial gain. Nonprofit organizations have also achieved significant benefits from data mining, such as in the area of scientific progress. The concept of data mining is simple yet powerful. The simplicity of the concept is deceiving, however. Traditional methods of analyzing data, involving query-and-report approaches, cannot handle tasks of such magnitude and complexity. The Need for Data Warehousing and Data Marts

The majority of databases are designed to hold the current data needed by an organization to perform its business activities. In a business organization, current data might include information concerning bills due, inventory levels, and product orders, and would most likely be contained in a billing/inventory/order database. In most cases, the minute that data become outdated, they are deleted from the database. For example, once a bill is paid, data about the bill is removed. Fortunately, many organizations have realized the value of being able to analyze historical data in order to discover patterns of behavior and predict future trends. For example, analyzing historical data can tell a retailer what items were ordered, in what quantities, and by which customers. One of the keys to understanding the value of databases is to understand how one database, whether it is current or historical, can be related to another. If you think about it, it makes good business sense to relate customer data to inventory data (because customers place orders that affect inventory), and inventory data to supplier data (because suppliers provide inventory items). We could name many more examples like this. The problem with most databases is they are not designed to be accessed simultaneously in this fashion. Data Warehousing and Data Marts Many organizations now use data warehouses to bring multiple databases together and make them available for data mining and other forms of analysis. A data warehouse is a collection of data, usually current and historical, from multiple databases that the organization can use for analysis and decision making. The purpose, of course, is to bring key sets of data about or used by the organization into one place. Bringing together so much data into a data warehouse makes analysis very difficult. To address this problem, organizations use what are called data marts. Data marts are related sets of data that are grouped together and separated out from the main body of data in the data warehouse. Data marts are designed to be made available to specific sets of users. For example, data about manufacturing can be put into a data mart and be made available to the production department. Human resource data can be put into another data mart and be provided to the human resources employees. This approach makes it easier for each group or constituency in the organization to access the data they need.