Vous êtes sur la page 1sur 58

Tutorial on database modeling and the SQL language

About the Database and SQL tutorial


Building a database is like building a house - you have to get it right at the design
stage (also called the modeling stage). Once all the walls are up and the plumbing is
installed, it's a bit late to realize that the bathroom should have been three feet wider. It
can be done but, it gonna cost you, baby!

So, in these lessons we're going to spend some time on database modeling. By studying
several examples of database applications we'll define what the rules are for building
a model of a database that we can then expand to cover all kinds of situations. Modeling is
what we do when we are designing a database. It's sort of like building a model car except
that it's not 3D and it's not plastic. Our model is on paper, or in a modeling program's data
file, but it does have a lot of parts that must be assembled in a logical fashion and must all
be glued together to work properly.

About the SQL language


Once the modeling part is down pat, we will create the databases and learn how to work
with them using theSQL language. SQL is the lingua franca of databases. It allows you to
communicate between relational databases from Microsoft, MySQL and Oracle, to name
just a few. We will look at all the important SQL commands that need to be mastered, again
through the use of many examples and exercises.

Throughout this tutorial we will use Microsoft Access 2003 (from Microsoft Office 2003
Pro) as the source of most of our examples. Almost all the sample databases are built in
Microsoft Access.

Gradually, we are including MySQL into the tutorial. The MySQL database server is a
powerful, very versatile server which is part of the whole open-source environment along
with other tools such as the Apache Web server, thePHP language and the Open Office
suite. All those tools are readily available free on the Internet. They are now a credible
alternative to the proprietary software sold by Microsoft and Oracle. And since the recent
announcement of a joint venture between Sun and Google, you can bet that we'll be seeing
a lot of action in that area in the coming months and years!

In addition to the database server itself, MySQL has a whole series of tools that work with
it. Most of the tools are developed by third-party vendors who usually sell them but there
are always free versions of these clients that help to make the developer's life a lot easier.
Since new tools are coming out all the time you may see different screen shots of the
examples in different lessons as we update things. We'll try to make it not too confusing!

To highlight some of the uses of the SQL language we've developed a sample project that
uses a Visual Basic 6 clientto connect to our MySQL database through SQL commands.
Once you've mastered the SQL language you may want to look at the Visual Basic 6 ADO
database programming project which is an introduction to the use of a visual client in
extracting and manipulating data from a relational database via SQL.

Even if you are still a user of Sybase Powerbuilder (which is the subject of another
tutorial, in French, atWebProfesseur database programming projects) you can run all
the examples using SQL Anywhere which has all the tools necessary to create and
manipulate databases. As an aside, I used to work with Sybase Powerbuilder and I taught
courses on it. I loved it for development work. But for some reason it never caught-on much
here in Canada and now I don't know of anyone who actually uses it. It's too bad.

And if you're looking for some great information on Visual Basic 6.0, you heve to check-out
this site: Visual Basic 6 programming tutorials.
Lesson 1 - Introduction to data modeling
A short history lesson
Once upon a time all files were stored on magnetic tape and all access was sequential. Then
came the disk drive and random access and there was joy in the land! But you know,
because your parents told you, that too much of a good thing is bad for you. And so it is
with random access - unless you organize your data it will be so randomized that you'll
never find it again.

So, somebody came up with a way to store large amounts of data in such a way that it
could be updated and retrieved when needed. They called the structure containing the data
a database and the programs doing all the administrative work of handling the data
a Database Management System (DBMS).

The first databases were modeled upon COBOL data structures (in those days every
programmer was a COBOL programmer) and were called hierarchical because of the way
in which the data are structured. Eventually they improved upon the first model and came
up with a network model, which has absolutely nothing to do with Novell or NT, but
describes the way the data elements relate to one another. Some of those large databases
are still in use today in legacy applications all over the place.

In the early 70's Dr. E.F.Codd, who happened to be a mathematician rather than a
programmer, came up with a new model he called relational. This relational model, built
on the mathematics of set theory, was powerful, flexible and easy to use. But it turned out
to be such a hog for disk space and processing time that it wasn't really a viable alternative
to the previous models. It wasn't until hardware performance improved in the late 70's that
the model started gaining acceptance.

By the mid 80's, cheap PC's with ever-increasing capabilities made it possible to develop
small versions of relational databases. It was Oracle Corp. that really put relational
database development on the map and today Oracle is still the leader in the field.

Starting the design process


• Step 1: Feasibility study
o Is the project too small? Too big? Technically feasible?

o What are the costs involved - for development, for maintenance?


Is it cost-effective - are the savings greater than the costs?

o Is the timeframe realistic? Can it be done in the time alloted?

• Step 2: Detailed analysis


o Start with the output and work backward. What output does the client want to
see? What input will be required to produce that output?
Describe the screens, the reports, the results that will be produced.

o Study the existing system (there's always something, even if it's done with
quill and ink).
Determine what must be kept, what must be changed and what must be
scrapped.

o Keep this in mind: Any system you deliver must perform at least as
well as the system the client is using now. Although that sounds simple
enough, I could come up with a pile of cases where an improvedsystem took
twice as long to do the work and cost twice as much to operate as
the inadequate system it replaced.

• Step 3: Data modeling


o Create a model (drawing, graphic representation, schema) of the data. Use a
pencil and paper if you have to or, preferably, a software modeling tool. This
is the equivalent of blueprints for a house. It does not require much effort to
add or remove things from a drawing. It is a lot harder to do once the house
is built or the database is coded.

o The model is created with the help of the client. The client knows what needs
to be done although he may not know how it will be done - that's your
job. Always keep the client involved at the design stage.

Data modeling
• Definitions
o Entity: an object, a thing in the system about which data is kept - equivalent
to a file - it will be implemented as a table in the database.

o Attribute: an item of data refering to an entity - equivalent to a field - it will


be implemented as a columnin a table in the database.

o Primary key: the attribute (or combination of attributes) that uniquely


identifies every occurence of an entity.

o Relationship: the way entities link to one another

• Examples

DEFINITION EXAMPLE
Student,
Professor,
Class,
Entity
RegisteredStudents in "School" application
Customer in "Billing" application
Employee in "Payroll" application
Student_Id,
Student_Name,
Student_Major for "Student" entity
Attribute
Cust_Number,
Cust_Ship_to_Address for "Customer" entity
Employee_Salary for "Employee" entity
Student_Id for "Student" entity
Class_Number for "Class" entity
Primary key
Student_Id + Class_Number for
"RegisteredStudents" entity
Professor teaches Class
Relationship Student registered in Class
Customer orders Product

• Graphical representation
o The main tool in the modeling process is called an Entity-
Relationship diagram or E-R diagram for short. It shows all the
components we have been discussing:Entities, Attributes of Entities, Key
attributes of Entities and Relationships between Entities.

o There is one symbol that appears in the diagram that we haven't yet
discussed: the line with a crow's foot at the end.
The line tells us that the entities are related and the ends of the line describe
the degrees of relationship, also called cardinality. Degrees identify how
many occurences of one entity are related to how many occurences of
another entity. Degrees are expressed in one of 3 ways:
 One-to-one
 One-to-many
 Many-to-many
For example: the Student <--> Class relationship is many to many - a given
student (an occurence of the Student entity) may take many classes (an
occurence of the Class entity) and each class may contain many students.
That is shown on the diagram by a crow's foot at each end of the line. The o
with the crow's foot says that a student may be signed-up for no classes (a
football player?) and a given class may have no students (not offered this
term).

The Class <--> Professor relationship is one to many: a given professor may
teach zero or many classes but each class must have one and only one
professor.

If you were told that each Professor only teaches one Class and that each
Class only has one Professor, you would be looking at a one to
one relationship.

It is very important to describe the degrees of relationships accurately when


you do the preliminary design. The client will not say: "There is a one to
many relationship between Class and Professor". He'll tell you, if you bother
to ask: "In this School, every class only has one Professor assigned to it". In
another school, you may hear: "We're very proud of our team-teaching
approach. A class may be taught by several Professors working together."
There you're looking at a many to many relationship and you will have to
implement the database accordingly.

Lesson 2 - Database design short case study


ezconsulting Inc. is a small consulting company offering database design and creation
services to a fairly wide range of customers. The company employs about 30 consultants,
analysts, programmers, network specialists, who will work in teams on projects for periods
of time ranging from a few days to several months.

At any given time there may be 10-12 different projects on the go. Because ressources are
scarce a specialist may be called upon to work on several projects simultaneously. In order
to keep some control over scheduling and costing, every employee is assigned to a
department and reports to only one manager, even when he's working on projects for other
departments. Every week every employee must submit a timesheet showing the number of
hours spent on each project.

As was the case for the shoemaker's children (they had no shoes because dad was too busy
making shoes to sell in order to put bread on the table), this company has
no Project Management database,simply because nobody has had the time to set one up.
And this is typical in this kind of environment. Do you take an analyst who bills $800/day
and put him to work on in-house maintenance? No, you don't. You wait until some bright
college student shows up for a co-op work assignment and you give him/her the job. The
company hopes that after the basic Project management application is operational other
modules such as Employee Skills Management and control of bids and RFP's can be
integrated to the database.
Designing the Project Management application
Here is what our first draft of the E-R diagram should look like for the Project Management
case:

Fig. 2-1

The diagram contains the information we would have gathered by talking to the client.
Notice that the attributes for Employee represent the minimum amount of information we
have to keep at this time. We haven't included things like "Home address", "Date of birth"
and so on. When we start working with SQL later we will add more information to the table.
The same applies for the other entities - we will add attibutes as we develop the model later
on.

It is important to make sure at this point that you understand the degrees of the
relationships shown.
Department <--> Employee is a one-to-many relationship - a given employee is assigned
to one and only one department and a given department contains zero or many employees.
This means that every employee in the company will be assigned to a department, even the
President who will be in Administration. A department may exist and have no employees
assigned to it. For example, we could create a new department in the database and, until it
is staffed, it will have zero employees assigned to it.
Employee <--> Project is a many-to-many relationship - a given employee works on one
or many projects and a given project may have zero or many employees. In order to keep
track of every employee's hours, all the work that is done will be billed to a project.
However, projects could be things like "In-house systems development", "Professional
development leave" or "Administrative duties". Any project may have no employees working
on it.

Now, once you have the E-R diagram down, you go over it one more time with the client to
make sure that you have the details down correctly and you are almost ready to start
creating the actual database. Notice that I said "almost".

Normalization
It is possible to start creating the database at this point. It's just a question of creating a
new table for every entity identified in the diagram. We'll be using MS-Access to do that
shortly. But how do you code the relationships?

There is a formal process to do that in database modeling. It's called normalization. It


means applying a set of rules to the data so that you group the attributes in such a way
that the relationships work. It's not really that complicated but it is a formula approach. If
you prefer to use that approach, get any good book on databases, look-up "normalization"
and follow the steps.

We'll do normalization using the intuitive approach - work with the data until it "feels" OK.
This could also be calledprototyping - create a working model of the database that is close
to what you want and keep improving it until it works perfectly, then put it into production.

However, whatever the approach taken, there are some basic rules that have to be adhered
to. The rules apply to any relational database and cannot be broken. They can't even be
stretched. Think of them as the Prime directives. The rules are:

1. Every table must have a primary key - an attribute or combination of attributes


that uniquely identifies every occurence in the table.

2. The primary key can never contain an empty or Null value. That makes sense -
if you had 2 that were empty, they wouldn't be unique anymore.

3. Every attribute of every occurence in the table can contain only one
value. Think of the Employee table as a grid. Every occurence, or line, represents
one employee and every column is an attribute. So, every employee can only have
one ID and one First-name and one Last-name, and so on.

The one-to-many relationship


Let's start with the easiest relationship: Employee <--> Department.

First we create a new database and call it ProjMgt.mdb. Then we create the first two
tables in the database: call them Employee and Department, and put in the fields from
the E-R diagram. Notice that the column-names in the tables are all coded with a prefix: e_
for Employee, d_ for Department and p_ for Project. This is a good habit to get into. It will
make your life easier later on. This is what we now have:
Fig. 2-2

Remembering that that is a one-to-many relationship, how do we associate the employee


with the department? There are 2 ways it could be done:

1. Add a column for Employee ID to the Department table. You get this:

Fig. 2-3

See the problem? When you start entering data, what do you put into the
d_Employee column? Rule 3 says you can have only one value. What if there are 2
employees in the Department? You could try to add another column for
d_Second_employee, but what if there are 20 employees, or 200? Obviously this is
not going to work. So we scrap this brilliant idea.

2. Add a column for Department to the Employee table. You get this:
Fig. 2-4

Any problem with this? Doesn't seem to be. Since every employee is assigned to only
one department, I only have one value to put into the column: employee 101 works
for department 10, and that's all.

In summary, to normalize a one-to-many relationship you add a column to the table at the
"many" end of the relationship to refer to the primary key at the "one" end.

The many-to-many relationship


The many-to-many Employee <--> Project relationship is a bit trickier.

In the end we want to associate projects and employees, to see who is working on what
project. To see how it must not be done we'll go through the exercise of adding columns to
the tables. So we add the Project table to the relationships:

Fig. 2-5

To create the relationship we could add a Project_Number column to the Employee table.
When we try it we see that we come up with the same problem we had in the previous
relationship: when we get to the e_Project column, what do we write? The employee could
be working on 7 different projects. Rule 3 says we can only enter one value.

Fig. 2-6

So we try it the other way - add an Employee_Number column to the Project table. Again,
when we get to the p_Employee column what do we write? There could be 25 employees
working on this project.

Fig. 2-7

Since those two attempts obviously won't work, there has to be something else. It's called
a link entity or link table. Most textbooks will just call that table Employee-Project or
Project-Employee. But in real life the entity does exist in our system. What is it that links
employees with projects? Right! It's the timesheet. The timesheet contains all the
information we need. So we add the Timesheet entity to the mix and modify our E-R
diagram:

Fig. 2-8

t_Employee is an employee-ID that refers to the Employee table, t_Project is a


project_number that refers to the Project table, t_Date is the period_ending date for the
timesheet and t_Hours is the number of hours the employee spent on that project. We also
specify that every line in the Timesheet table must have one and only one employee_ID and
must have one and only one project_number. In other words we cannot create a Timesheet
for an employee who doesn't exist or charge for work on a project that doesn't exist. Who
would ever think of doing such a thing anyway!

What is the primary key for Timesheet? To get a feel for the key, let's look at the data that
will be input:

Fig. 2-9

It's clear that t_Employee or t_Project can't be the primary key because they both repeat;
remember: every occurence in a primary key column must be unique. How about a
concatenation of t_Employee + t_Project. That looks good so we try it. It works fine for
one week. The following week, employee 202 has worked on project S4440 again and we
get a duplicate key error!

Fig. 2-10

So we add t_Date to the key and that solves the problem. Now, assuming that the client
has said that if an employee works on a project twice in one week he adds-up the hours,
the combination of employee + project + date is truly unique.

Conclusion: there is only one way to normalize a many-to-many relationship and


that is to create a link table.The link table must contain columns that refer back to the
other tables so that the many-to-many relationship becomes two one-to-many
relationships.

Lesson 3 - Introduction to the SQL language


The SQL language
SQL = Structured Query Language
Usually pronounced 'Sequel' or, sometimes, 'ess-queue-el'

Relational database manipulation language developed in the 1970's by Dr. E.F.Codd and
IBM.

Popularized by ORACLE

Advantage is that it allows databases that are not programmed the same to talk to each
other - it is the basis of Client/Server architecture.

A Client application written in Visual Basic under Windows can communicate with
a Server running Oracle - the Client sends the Server a SQL command which is interpreted
and the result sent back to the Client.

Also, all DBMS's use SQL in their internal operations. Database Administrators (the people
who build and maintain the database structure) need in-depth knowledge of the language.

To build the database and test our SQL commands we'll be using MS-Access and
the Project Management database that we designed in previous lessons. If you haven't used
Access before, take a look at our MS-Access tutorial to get the basics of the tool.

To create a SQL query with Access you simply go through the normal query procedure and
select SQL View instead of the Query wizard; then specify that it is a new query (you don't
have to identify the tables used):

Fig. 3-1

and then write the code and run it:


Fig. 3-2

SQL syntax in Access

SQL syntax is not very strict. A statement can be be written over several lines but, most
implementations of SQL will insist on the semicolon (;) at the end.

Upper and lower cases don't matter but again, in most installations you will see common
practices such as writing all command verbs and clauses in uppercase and table names,
column names, etc. in lowercase.

Syntax errors will be flagged as soon as you try to execute the statement. Experience will
show what the most common errors tend to be. Since SQL syntax is not very complicated to
begin with, errors are usually easy to detect and to fix.

There is not a whole lot of punctuation involved. The ; at the end of the statement is
important and, of course, parentheses have to be in the right places, like any other
language. As for data types, string values are inclosed in single quotes, dates in pound signs
and numeric in nothing.

For example,
e_salary = 55000
e_fname = 'Mike'
e_hiredate = #1995-10-10#

SQL INSTRUCTIONS

The SQL instruction set consists of only about 30 instructions. Although there are SQL
instructions to create and manipulate tables and the data they contain, it is quite possible
that all the maintenance functions will be done using the DBMS (Access in this case). If that
is the way your system is set up your applications will end up using the SELECT instruction
95% of the time.
In case you missed it previously, a query is a question, an interrogation, a lookup. That is
what SQL is built for - it exists to get information from databases.

In this tutorial, we will assume that the database itself is already created and named. Some
of the tables may have been created in Access but we will use SQL statements to create the
others, just to make sure that we know how to do it.

Table manipulation statements

There are SQL statements to create tables, modify them or remove them.

To create a new table in the current active database:

CREATE TABLE table_name (column1 datatype not null, column2 datatype, ...);

Example:
CREATE TABLE employee (e_id string(3) not null, e_fname string(20),
e_salary single, e_hiredate date);

The usual datatypes are:

INTEGER Integer values between -32K and +32K


SINGLE Single-precision floating point
DOUBLE Double-precision floating point
DATE Date/time
STRING(n) Fixed-length string; n = number of characters
BOOLEAN True/False

We add the not null clause to the statement to indicate that null is not allowed in the
column, if it is to be a primary key, for example.

Speaking of NULL

Everyone knows by now that when we speak of characters, we mean the letters
of the alphabet and the numbers and punctuation signs and so on. The space ( )
is a character and so is the zero (0).

In SQL we will often have to refer to the NULL value. NULL is not a
character; it is the absence of a character. In books they say that NULL means
that the value is undetermined. In fact, it means that there is no value assigned
to the field, it is completely empty. NULL is not numeric, nor string, nor date.
Any type field can be NULL.
When the quantity-on-hand of an item in stock becomes 0, it is not null; it
contains the numeric character 0. When I assign a 0 grade to an assignment
(which happens all too frequently), that grade is included in the class average.
If there is no grade assigned because the student was ill, that field is null and
therefore it is not computed as part of the class average.

SQL commands will not consider nulls when they count or compute data. In
some cases it is necessary to test if a field contains a value or not by using the
clauses: IS NULL or IS NOT NULL in a statement.

In the example above, when entering data in the Employee table, you could
theoretically have one employee with spaces as an Id but, you are not allowed
to have one with an empty Id.

To change the structure of a table:


ALTER TABLE table_name ADD (column datatype);

Example:
ALTER TABLE Employee ADD (e_Address string(30));

ALTER TABLE Project ADD (p_Country string (20));

Note that there is no statement to change or remove a column.

To delete a table from the database:


DROP TABLE table_name

Example:
DROP TABLE Employee;

Data manipulation statements

Data manipulation statements are used to work on the data contained in the tables.

To create a new record, a new row, in a table:

INSERT INTO table_name VALUES (value1, value2, ...);

Assuming that we have executed the CREATE TABLE and the ALTER TABLE statements from
above (and not the DROP statement), the Employee table now contains 5 columns: Id,
First name, Salary, Hire date and Address.

The INSERT statement will create a new employee record; it will add a row to the table.
The number of data items must correspond to the number of columns and the type of data
must correspond to the datatype of each column.

Example:

To change data in an existing record:


UPDATE table_name SET column1 = value1, column2=value2, ...
WHERE condition;

Example:
UPDATE Employee SET e_salary = 30000
WHERE e_Id = '222';

UPDATE Employee SET e_Salary = e_Salary * 1.1


WHERE e_Departement = '101';

The WHERE clause is SQL's IF statement. The update is done only if the condition in the
WHERE clause is true.

In the first example, the update is performed only for the employee whose Id is '222'. His
salary is set at 30000.

In the second example, the update is performed for every employee whose department is
'101'.

In example 2, the command from the boss to launch the SQL statement would have been:
"Give everybody in department 101 a 10% raise".

UPDATE Employee SET e_Salary = 100000;

If there is no WHERE clause in a statement, the update is performed on all the records in
the table.

To remove records or rows from a table:


DELETE FROM table_name WHERE condition;
Example:
DELETE FROM Employee
WHERE e_Id = '222';

DELETE FROM Employee


WHERE e_Salary > 100000;

In the first case, delete employee '222'. In the second case, delete everybody earning more
than 100K. Hey! What you gonna do? Times are tough all over!

DELETE FROM Employee;

If there is no WHERE clause, every record in the table is deleted! And it won't even ask
"Are you sure?".

Database servers
In the previous lesson we started looking at the SQL language syntax. I used an Access
database to illustrate how to test our SQL commands against a real database.

You have to understand that Access is not typical of the database environment you will
probably work with in the real world.

For one thing, it's meant to work in standalone mode (with one user) or, at most, shared
among 4-5 users. Access is not a database server.

Also, in Access you don't get to see the SQL code very much. It's always there but it's
behind the scenes, written by the various wizards and hidden behind forms or reports or
QBE queries.

For real databases (and by that I mean really big or with many users) you will normally
work with a database server. A database server is software that resides on a large
computer with good communications capabilities. It stores the data, handles the
maintenance of the tables and responds to the demands of clients who want to manipulate
the data.

The best-known server is Oracle. It can handle any database from a few dozen users to
thousands of users.

Oracle's biggest competitor in the really big applications is SAP.

Then, for small to medium jobs you've got SQL Server from Microsoft, which is sort of the
big brother to Access.

All these products are based on what we call the relational model. They all store the data
in tables, they have primary keys and they build relationships between the tables.
And they all use the SQL language to communicate between the server and the clients.
Some of the servers have modified SQL somewhat for their own use but, it's essentially the
same language for everyone.

Lesson 4 – Queries
The database
Before going any further, please make sure that the ProjectMgt database you are working
with matches the model we created initially. You may have experimented with the tables
and the columns in the previous lesson and that is perfectly OK! But before going on to
the query statements, it is recommended that you consult Fig. 2-8 in Lesson 2 and match
your database to that model. Don't worry about primary keys and relationships and so on at
this point. We'll take care of that later. But do enter some meaningful data in the tables so
that your queries will have something to display when you run them (15 or 20 records in
each table should be enough).

When you input data into the tables, if you haven't created the relationships in Access, try
to maintain referential integrity. That is: when you assign a department number to an
employee, that department number should already exist in the Department table. When you
create a Timesheet record, the employee number should exist in the Employee table and
the project number should exist in the Project table.

You may use SQL statements to change the database or you may do it with Access. If
you're really lazy...er, sorry, really busy, you can download the database from the
Download area after the last lesson.

Import from Access

Since we've already built the Project management database in Access, it seems a shame to
waste all that work.

Fortunately, DBManager has a Wizard to convert the Access database into MySQL.
Creating Queries

As we mentionned earlier, it is quite probable that 95% of your work with SQL will consist of
questions to the database. If the database structure is well-built and the information has
been input, any question can be answered, no matter how tricky. "How many widgets were
bought by women aged between 25 and 30 on Tuesdays in months ending in R over the
past 5 years?" Give us 5 minutes and we'll build a query that will answer that for you. That's
called an ad hoc query, which means "as needed" rather than one which has to be planned
and programmed in advance. It can impress the hell out of the Boss or the Sales Manager!
Hey! before you know it we'll be as good at this stuff as the guys who do the baseball stats
on TV. "Yes Frank, it's amazing that this guy hit 255 when batting left-handed against right-
handed pitchers in night games when the moon was full and the temperature was over 75
degrees and there was a light breeze from the west!"

The only statement needed to build a query is the SELECT statement.

The basic syntax of the SELECT statement is:


SELECT column1,column2, FROM table_name1,table_name2,WHERE condition;

The SELECT clause lets you specify which columns to display (they may be table columns or
they may be calculated from the data in other columns). The FROM clause lets you specify
the table or tables from which the data will be obtained. Note that the standard SELECT
statement allows you to get the data from as many tables as you need. If you have to
access the Employee table and the Timesheet table to build the query, you can do it. If you
have to access 15 tables, you can do it. But that's a lot more involved and we'll leave it for
another day, more specifically, Lesson 7. For the next few lessons we'll master the SELECT
statement to access any information we need in one table at a time. Finally,
the WHERE clause (see below) will determine which records, also refered to as rows, will be
selected.

Here are some examples of the SELECT in action:

Fig. 4-1

Instead of listing the columns, use the * to mean 'All the columns'

And the result is:


Fig. 4-2

Or display only certain columns:

Fig. 4-3

and the result is:

Fig. 4-4

To get data from the Employee table:


Fig. 4-5

from which we get:

Fig. 4-6

Same query, different look:

Fig. 4-7

The 'AS' clause allows you to display a column heading that is more representative
than the field name usually displayed by the query. Compare with Fig. 4.6.

Fig. 4-8

This is what it look like with the Query Editor


THE WHERE ... CLAUSE

As stated previously, the WHERE clause is in fact an IF statement. If a record returns TRUE
to the WHERE clause, it is selected to be displayed.

If the table contains 10,000 records, or rows, you may wish to see only a few or even only
one. In that case you would specify the condition as "... WHERE primary_key_column =
'value' ...".
The WHERE clause uses the usual operators to build the condition:

= > < >= <= <> or !=

and a few you may not be as familiar with but which we'll see in the examples:

BETWEEN LIKE IN NOT

For the next examples, suppose we have a new table called Products. Note that we can
create this table in the ProjectMgt database, even though it ha absolutely nothing to do with
the application. We have to put the table somewhere and that's as good a place as any. It's
important to understand that the tables have no relationships between each other until we
define those relationships. If we want to create a table to be used on it's own and then drop
it when we're done, there is no problem with that.

PRODUCTS
ProdNum
ProdName
SellPrice
Cost

Fig. 4-9

EXAMPLES:

SELECT * FROM Products


WHERE ProdNum = 'A1234';

SELECT ProdNum, ProdName, SellPrice


FROM Products
WHERE SellPrice > 50;
SELECT ProdNum, SellPrice, (SellPrice * 1.1)
FROM Products;

we can display a calculated column, in this case, what a 10% price increase would look like

use the usual arithmetic operators:

+ - * / ^ ( )

There is a common misconception about


calculated columns in the SELECT statement -
people think that the calculation will somehow
change the data in the table. That is impossible.
The SELECT statement is strictly
a display statement. Any calculations done are
read-only. There is no way that a SELECT can
modify a table. The only statements that can do
that are the ones we looked at in the previous
lesson: INSERT, UPDATE and DELETE.

SELECT ProdNum, SellPrice, Cost, (SellPrice - Cost) AS [Profit]


FROM Products
WHERE ProdNum LIKE 'A*';

we can display the calculated column with an appropriate title, for all products whose
number starts with 'A'.

* and ? are the widcard characters

* = character string (any number of characters)

? = 1 character

SELECT ProdNum, ProdName


FROM Products
WHERE ProdNum LIKE "A?5??";
SELECT ProdNum, ProdName, SellPrice
FROM Products
WHERE SellPrice BETWEEN 50 AND 150;

could also be written as >= 50 AND <= 150

SELECT ProdNum, ProdName


FROM Products
WHERE ProdName LIKE "*general*";

display if the name contains the string 'general'

SELECT ProdNum, ProdName


FROM Products
WHERE ProdNum IN ('A100', 'A200', 'B500', 'D800');

if the product number is one of those named

AND and OR are used like in all other languages:

SELECT * FROM Products


WHERE ProdName LIKE "A*" AND SellPrice > 500;

SELECT * FROM Products


WHERE (SellPrice - Cost) < 10 OR (SellPrice - Cost) > 500;

display the low-profit and the high-profit items


Working with dates

Whenever you develop a commercial application, there is absolutely no way that you can
get by without using date fields. There are Birth dates, Hire dates, Delivery dates, Order
dates, and so on, and so on ....

In ancient times, like 20 years ago, dates were stored as strings and we all remember what
that brought about in 1999. Now all DBMSs handle dates in a Date/Time format, which
makes our lives a lot simpler, but we have to be aware of the particular properties of Date
formats.

To begin with, know that you can do calculations with dates as you do with numbers.

#2001-01-31# - #2001-01-01# will return 30, the number of days between the 2 dates.

#2001-01-01# + 3 will return #2001-01-04# because a numeric constant is always


taken to mean days.

When using the comparison operators, > #date1# is taken to mean later than or after
and < #date1# is taken to mean earlier than or before.

In the WHERE ... clause, ... BETWEEN #date1# AND #date2# sets a date between
date1 and date2, inclusive.

To work with date fields in SQL, we'll use the Date and Time functions that Access supplies.
Note that those functions are available in just about every environment that supports SQL.

The main functions: NOW( ) and DATE( ) return the current date. The difference between
the two is that NOW( ) returns date and time, at this moment, and DATE( ) returns only
the current date.

In Access, a date or time constant must be identified with # ... #, as in:


... WHERE p_startdate = #2001-01-01#;

Date formats
If you intend to do e-commerce in the global village, you have to
understand that different folks have different ways of doing things.

For example, if you are American and you tell your French
girlfriend, the love of your life, that you'll meet her under the Eiffel
tower on 01/02/03, there is a good chance that you'll never see her
again. To you it is obvious that you specified the date as January
2nd, 2003. In France, as in other French areas, like Quebec, the date
is understood to be the 1st of February, 2003. In your case, it may
work out. If you straighten out the misunderstanding in time, you go
back a month later and she's waiting for you. Good luck!

To avoid problems, get used to using the ANSI international


standard date format: yyyy-mm-dd, as: 2003-01-02. Note the use
of the 4-digit year. Remember all that anguish we went through in
1999 with the 2-digit 00 year? We don't want that to happen again.
Also, note that the separator is the dash character - , and not the slash
/.
To set the date format, go through the Windows Control Panel,
Regional settings. Since SQL and Access get their formatting from
Windows, the format will be selected automatically.

In Access and SQL, one of the most useful functions is called: DateDiff( )

DateDiff('interval', #date1#, #date2#) returns the time difference between date1 and
date2, expressed in interval units which could be: days, months, years, weeks or hours.

The interval is specified as: 'd' for days, 'w' for weeks, 'm' for months and 'yyyy' for years.

For example:

Datediff('d', #2001-01-01#, now()) returns the number of days between January 1st
and today.

Datediff('m', p_StartDate, p_EndDate) returns the length of the project, in months.

If the result displays too many numbers after the decimal, use the ROUND(number,
digits) function to display the number rounded to 'digits' positions after the decimal:
ROUND(Datediff('m', p_StartDate, p_EndDate), 2).

In theory, Datediff('yyyy', e_BirthDate, now()) returns the employee's age, expressed


in years. In practice however, you will find that it works or doesn't work depending on
whether the employee has had his birthday yet this year or not.

To calculate the exact age, use the following formula:

INT(Datediff('d', e_BirthDate, now())/365.25)

Calculate the number of days and divide by the exact number of days in a year, which, as
you know, is 365.25 and not 365. That takes leap years into account.
The INT( ) function truncates the result so that 25.9 becomes 25, for example; the
employee is 25 years old until the day she turns 26; after the age of 5, you rarely hear
people say that they are 25 and a half years old.

When working with age, remember that you can often use Date-of-birth directly, without
doing the age calculation. Don't forget that the smallest date refers to the oldest person.

Eliminating duplicates

To close out this section on SELECTs, we'll look at how to eliminate duplicate lines from
query results.

For example, suppose we want to see the list of countries where we have projects. If we do
this:
SELECT p_Country FROM Projects;

we get all the countries for all the projects; if there are 5 projects in Canada, "Canada" will
appear 5 times in the list.

If we want only the different countries, we can do this:

SELECT DISTINCT p_Country FROM Projects;

where the DISTINCT clause will list only different occurences; if an item is selected more
than once, only one will appear.

Case study - Bids management

Our organization, FAM.org, runs development projects in foreign countries.


Each project has a name, a project chief (or Head), a start and end date and an amount of money
budgeted to complete the project.
To actually operate the projects, FAM calls upon companies from the private sector.
For every project, companies are invited to submit bids. They must specify how much they will
charge to do the project. FAM will compare the bids and will select a company from the list of
bidders.
In order to submit a bid, a company must be registered with FAM. It must provide details such
as: name of CEO, number of employees and date of creation of the company. City and state in
which the company HQ is located are also important because, as you may guess, there will be
politics involved.
The purpose of the database is to track the bids. There are hundreds of projects ongoing or
planned at any given time and there are dozens of companies bidding on those projects. Just
tracking all the bids is a big job.
FAM wants to be able to see which companies bid on which projects, how much the bids differ
from the budget amounts, how qualified are the companies in terms of experience and size, what
are the project start and end dates for all the various projects, etc.

Download sample database

You can download the sample Bids management database from the Download Area
The database is created in MySQL and MySQL Query Browser is used to execute the managers'
requests.
Which companies bid on project 05-7777?

SELECT WITH THE AGGREGATE FUNCTIONS


You use aggregates in the SELECT statement when you want to get summary information
(statistics) on sets of data.

Here we assume that you've done enough programming to know that a function is a system-
defined program that accepts a parameter from the user and returns an answer. A function is
always composed of a keyword followed by parentheses ( ).

Aggregate functions
SUM (expression) total values in a numeric expression
AVG (expression) average values in a numeric expression
COUNT (expression) the number of non-null values
COUNT (*) the number of selected rows
MAX (expression) the highest value in the expression
MIN (expression) the lowest value in the expression

First, note that an aggregate function will always return only one row. That's because it answers
a question refering to a group or set of data. You can find the biggest value in the set but, you
can't know what item that biggest value refers to. Same with smallest value or average.

It looks like a good idea to write something like:

SELECT ProdNum, ProdName, MIN(Cost) FROM Products;


to get the name of the lowest-cost item. But it won't work because the aggregates don't work on
individual rows.

EXAMPLES:

To obtain the biggest SellPrice in the Products table:

SELECT MAX(SellPrice) FROM Products;

To obtain the number of rows in the Products table, in fact the number of products carried:

SELECT COUNT(*) FROM Products;


Now, the previous statement will count the number of products based on the number of primary
keys or Product numbers entered.

If you thought that there might be many duplicates in the items carried, you assume that the
duplicates would have the same ProdName; so by counting ProdName and DISTINCT
ProdName you would get an idea of how many duplicates there are, although you cannot
establish what they are:

SELECT COUNT(ProdName) FROM Products;

SELECT DISTINCT COUNT(ProdName) FROM Products;

The WHERE clause can also be used with aggregates to define the set of data to be calculated.

To find out how many big-profit items you have (assuming that big means more that $500), you
do this:

SELECT COUNT(*) AS [Number of big-profit items]


FROM Products
WHERE (SellPrice - Cost) > 500;

Or, in this case, to get the average cost of sportings goods, assuming that items in the Sports
department all have a number starting with 'S':

SELECT AVG(Cost) AS [Average cost of Sports]


FROM Products
WHERE ProdNum LIKE "S*";

The AVG function will return the average of a set of numerical values and SUM will return a
total:

SELECT AVG(SellPrice) FROM Products;

SELECT SUM(Cost) FROM Products;

Although you are not allowed to work on individual rows, you are allowed to use several
aggregates in the same statement:
SELECT SUM(Cost), COUNT(Cost), AVG(Cost), AVG(SellPrice)
FROM Products;

Top

USING SUBQUERIES

We said earlier that you cannot use an aggregate function with an individual query. You can find
what the biggest SellPrice is but you can't find what that Product is. Although that's true with the
normal SELECT statement, there is a way to work around it. It's called a subquery and it relies
on what we call the priority of operators in programming - the fact that any operation in
parentheses, ( ), is executed first in a statement because ( ) is the operator with the highest
priority.

When we do this:

SELECT MAX(SellPrice) FROM Products;


the query returns the value of the biggest SellPrice.

Now, if we enclose that statement in parentheses and use it as a subquery in another statement
like this:

SELECT ProdNum, ProdName, SellPrice FROM Products


WHERE SellPrice = (SELECT MAX(SellPrice) FROM Products);
the subquery returns a single value which is then used in the WHERE clause of the main
statement to display the number and name of the product having the biggest SellPrice. If more
than one products have the max price, several rows will be displayed.

What products cost more than the average cost of products?


First, calculate the average cost in a subquery and then, compare the table with that value:

SELECT ProdNum, ProdName, Cost FROM Products


WHERE Cost >= (SELECT AVG(Cost) FROM Products);

The subquery can also be used to answer questions where you have to compare data with other
rows from the same table.

Getting back to our ProjectMgt example, we'll use the Employee table.

How would you answer this: "Which employees live in the same city as employee '1234'?".

You could do it in steps.

First you have to find the employee's city:

SELECT e_city FROM Employee


WHERE e_Id = '1234';
and, if it is, let's say, 'Boston', use that in the next statement:

SELECT e_Id, e_Fname, e_Lname FROM Employee


WHERE e_City = 'Boston';

Or, you could decide to do it efficiently and use the subquery technique:

SELECT e_Id, e_Fname, e_Lname FROM Employee


WHERE e_City =
(SELECT e_city FROM Employee WHERE e_Id = '1234');

Which employees are older than John Smith?

SELECT e_Fname, e_Lname, e_BirthDate FROM Employee


WHERE e_BirthDate <
(SELECT e_BirthDate FROM Employee
WHERE e_Fname = 'John' AND e_Lname = 'Smith' );
Note that the subquery must return one and only one value. The WHERE clause in the main
query can only compare to a single value and that means one column from one row. In the
previous statement, if there is more than one 'John Smith' in the company, we've got a problem.
In that case we would have to use e_Id instead of name to identify the person.

You have to recognize that the following 2 statements don't make any kind of sense:

SELECT e_Id, e_Fname, e_Lname FROM Employee


WHERE e_City =
(SELECT * FROM Employee WHERE e_Id = '1234');

SELECT e_Id, e_Fname, e_Lname FROM Employee


WHERE e_City =
(SELECT e_city FROM Employee );

DISPLAYING RESULTS IN ORDER

It may not have been mentionned specifically yet: in a database table, there is no way to input
data in a given order. In other words, if you add a row to a table you cannot insert it between
other rows so that it comes out automatically in alphabetical order. Whenever a new row is
added, it is simply appended to the end of the table.

If you want the rows to come out in a given order you have to sort them. To sort rows you use
the ORDER BY clause in the SELECT statement.

The syntax of the ORDER BY clause is:


SELECT select_list
FROM table
ORDER BY expression [ASC ¦ DESC];

ASC stands for Ascending order and it is the default value


DESC is used to sort in Descending order

To list all projects in order of their StartDate, with the oldest first (smallest date):
SELECT p_Number, p_Title, p_StartDate, p_EndDate
FROM Project
ORDER BY p_StartDate;

To get the list in reverse order, with the most recent at the beginning, use the same SELECT but
add the DESC option:

SELECT p_Number, p_Title, p_StartDate, p_EndDate


FROM Project
ORDER BY p_StartDate DESC;

You can also have a sort within a sort by specifying 2 sort fields. The most common example of
that is sorting in First name order within Last name - all Smiths will be in First name order, etc.

Note that the main sort field is named first, the secondary sort field is second and so on. In this
case Last name is the main sort order:

SELECT e_Id, e_Fname, e_Lname


FROM Employee
ORDER BY e_Lname, e_Fname;

What if you have a calculated expression that you want to sort on.

If you need to list the length of all projects and sort on that expression:

SELECT p_Number, p_Title, p_StartDate, p_EndDate,


Datediff('m', p_StartDate, p_EndDate) AS [Project length]
FROM Project
ORDER BY 5;

There are 5 elements in the select_list. 'ORDER BY 5' specifies to sort on the fifth element, the
calculated field. You could do the same for any other sort specification. For example, 'ORDER
BY 2' in this example will sort in order of p_Title.

Lesson 6 - Sorting and grouping

DISPLAYING RESULTS IN ORDER

It may not have been mentionned specifically yet: in a database table, there is no way to input
data in a given order. In other words, if you add a row to a table you cannot insert it between
other rows so that it comes out automatically in alphabetical order. Whenever a new row is
added, it is simply appended to the end of the table.

If you want the rows to come out in a given order you have to sort them. To sort rows you use
the ORDER BY clause in the SELECT statement.

The syntax of the ORDER BY clause is:


SELECT select_list
FROM table
ORDER BY expression [ASC ¦ DESC];

ASC stands for Ascending order and it is the default value


DESC is used to sort in Descending order

To list all projects in order of their StartDate, with the oldest first (smallest date):

SELECT p_Number, p_Title, p_StartDate, p_EndDate


FROM Project
ORDER BY p_StartDate;

To get the list in reverse order, with the most recent at the beginning, use the same SELECT but
add the DESC option:

SELECT p_Number, p_Title, p_StartDate, p_EndDate


FROM Project
ORDER BY p_StartDate DESC;

You can also have a sort within a sort by specifying 2 sort fields. The most common example of
that is sorting in First name order within Last name - all Smiths will be in First name order, etc.

Note that the main sort field is named first, the secondary sort field is second and so on. In this
case Last name is the main sort order:

SELECT e_Id, e_Fname, e_Lname


FROM Employee
ORDER BY e_Lname, e_Fname;

What if you have a calculated expression that you want to sort on.
If you need to list the length of all projects and sort on that expression:

SELECT p_Number, p_Title, p_StartDate, p_EndDate,


Datediff('m', p_StartDate, p_EndDate) AS [Project length]
FROM Project
ORDER BY 5;

There are 5 elements in the select_list. 'ORDER BY 5' specifies to sort on the fifth element, the
calculated field. You could do the same for any other sort specification. For example, 'ORDER
BY 2' in this example will sort in order of p_Title.

GROUPING DATA

For this next section we are going to use a new database


called BookStor. You can download it now from the Download area. Select the version that you
need. For this lesson we'll use the Authors table only. But, keep the database handy. Later, when
we get to Lesson 9, we'll use the other tables and Queries in BookStor to look at more advanced
concepts such as Crosstab queries and Union queries, etc.

In the previous lesson, we learned how to use the aggregate functions to produce summary
information on sets of data.

From my table of authors, I want to know how many authors produce 'Romance' novels. I use a
simple query with the aggregate function:
SELECT COUNT(au_id) AS [Number of authors]
FROM Authors
WHERE au_subject = 'Romance';

But, what if I want to know how many authors I have in each category, even if I don't know what
the categories are? It can't be done with a simple Select with aggregate.

The answer is a new clause called: GROUP BY which is used with the Select.

The syntax of the GROUP BY clause is:

SELECT select-list
FROM table
GROUP BY group_by_list;
To answer the question above:
SELECT au_subject AS [Subject], COUNT(au_id) AS [Number of authors]
FROM Authors
GROUP BY au_subject;

Want to know how many authors in each state and what their average salary is?

SELECT au_state AS [State],


COUNT(au_id) AS [Number of authors],
AVG(au_salary) AS [Average salary]
FROM Authors
GROUP BY au_state;

For practice, let's apply the stuff from the beginning of this lesson. I want the results in order of
Average salary. That's easy:

SELECT au_state AS [State],


COUNT(au_id) AS [Number of authors],
AVG(au_salary) AS [Average salary]
FROM Authors
GROUP BY au_state
ORDER BY 3;

Remember that ORDER BY 3 means the third item in the select_list.


Now, let's say you don't want to see details of all the states in the Authors table, you just want to
see authors from Utah and Kansas.

There is another clause to select groups, the same way as the WHERE clause selects rows. That
clause is called HAVING and is used the same as the WHERE but, on groups.

SELECT au_state AS [State],


COUNT(au_id) AS [Number of authors],
AVG(au_salary) AS [Average salary]
FROM Authors
GROUP BY au_state
HAVING au_state IN ('UT', 'KS');

Note the use of the IN operator to mean 'If the author's state is in this list, the row is selected'.
The clause could also have been written as: HAVING au_state = 'KS' OR au_state = 'UT'.

You can also group data on more than one column.

How many authors of each category are there in each state?

SELECT au_state, au_subject, COUNT(au_subject) AS [Number]


FROM authors
GROUP BY au_state, au_subject
ORDER BY 2;
If you want to see it displayed differently, just change the sort order:

SELECT au_state, au_subject, COUNT(au_subject) AS [Number]


FROM authors
GROUP BY au_state, au_subject
ORDER BY 1;
Lesson 7 - Joining tables

A new case study - The Editor Project

This is one that's been around for many years. It's been used to teach SQL for the last 25 years, at
least. I converted it to run in MySQL.

You've got Authors. Authors write books. Sometimes, a book is written by several authors, each
of whom will receive a percentage of the royalties. Some authors have written several books;
others have yet to write any (we may have a file on them because they're in the process of writing
and we gave them an advance). So, the relationship between Authors and Books is many-to-
many.

The BookAuthor table is the linking table between authors and books.

Publishers are companies that print and distribute the books. We'll get to that relationship later.

P.S. You'll notice that the last column in most tables has a funny sign at the end of the data.
That's a carry-over of a CRLF because the table was imported from Access. It would be better to
modify those because some queries will not work properly.

As an exercice in SQL, use an Update command to change them, as in this example:

If you're not sure how that works, look-up the Update syntax in the Query browser.

The % sign is the widcard character in MySQL, like the * in Access.


A FEW MORE WORDS ON MODELING

If you've been following this from the beginning, you've been playing around with the
ProjectMgt database. It's perfectly OK to have run tests on it, to have added or changed data or to
have changed the structure of the tables themselves. Before continuing, however, we should
standardize the database so that we're all on the same wavelength for the rest of the lessons.

Let's review. We started out with this model:

Then we added a few columns to different tables: e_BirthDate in Employee, p_Country in


Project and maybe a few more.

But there is still one problem with the design. In fact, the problem is that the database is not
normalized to the Third Normal Form (3NF). Uh? Let's look at it in practical terms.

If you have one table for Timesheets, you get one row for each timesheet entry: on a given
Friday, an employee who has worked on 2 projects submits his timesheet. You input the
timesheet date, the employee-Id, the project number and the hours for the first project, creating
one row in the table and then you repeat for the second project, creating another row in the table.
Now, if you have an application (in VB, Powerbuilder or Access) that wants to print a timesheet
report, it will probably print something like this:
IMPORTANT: THE MASTER/DETAIL FORM

This is very standard form format. It's called a Master/Detail form.

In business applications you will use dozens of these: Orders,


Invoices, Purchase Orders, PO Requisitions, etc.

What they all have in common is that there is a Master section which
contains information on the transaction as a whole, and a Detail
section which contains information on the details of the transaction.

In an Invoice, for example, the invoice date, customer name and


address, shipping date are in the Master while items purchased,
quantities, prices are in the Detail section.

It is very difficult to produce a Master/Detail form from a single table.

Therefore, what we will do in our ProjectMgt database is normalize the Timesheet table into a
Timesheet-Master table and a Timesheet-Detail table. Master will contain the Timesheet number
as Pk, the Employee-Id and the Timesheet date (all the information common to all transactions).
Detail will contain the Timesheet number, Project number and Hours-worked for each project.

Since there may be several Project numbers associated with one Timesheet number in Detail, we
will assign Timesheet number + Project number as Pk for the Detail table.

You may feel a bit overwhelmed at this point. Take your time.

You should download a new copy of The Project management database now. If you prefer to
work with the 97 version, go to the Downloads area - it has several versions of the database.
Study it carefully and try to relate the design of the database to the Timesheet form shown
above. Remember that a database is not a theoretical concept - it has to be applied to real-life
applications.

Top

USING MULTIPLE TABLES IN A SELECT

Let's go back to the ProjectMgt example.

When you have to look at Employee data you do a SELECT from the Employee table.
Remember that the Employee table contains the employee's department but as a number only.
When you run the SELECT you can't tell what the department's name is from the output.

SELECT e_Id, e_Fname, e_Lname, e_Dept


FROM Employee;

When you look at data from 2 or more tables in a SQL statement, the operation is called a JOIN.
You are in fact joining 2 tables to provide the result needed. However, there is no JOIN clause in
SQL - everything is done with the SELECT statement.

In the example above, you want to see the department's name instead of it's number when you
look at an employee record. Since the department name is in the Department table and all the
other fields are in the Employee table, it is fairly obvious that you will have to open 2 tables in
the SELECT. Let's try it:

SELECT e_Id, e_Fname, e_Lname,


e_Dept, d_DeptNum, d_DeptName
FROM Employee, Department;

It should be immediately obvious to you that although the query worked, it produced way, way
too much data.

And that brings us to talk about how a Join operation works.

When you tell SQL to join 2 tables, it really joins them! In fact, it joins every row in the first
table with every row in the second table. If the first table, Employee, contains 5 rows and the
second table, Department, contains 3 rows, the result displays 5 x 3 = 15 rows. Which is what
happened in the example above. However, since there are only 5 employees, it means that 10 of
those rows are meaningless.

The trick to know about joining table is fairly simple yet, absolutely crucial:
The tables you are joining must have common columns. Those columns don't have to have
the same name but, they must contain the same kind of data: same datatype and size.
e_Dept and d_DeptNum are both Numeric, Long integer and the Dept. numbers assigned
to employees exist in the Department table.

The only meaninful information in a JOIN operation is that which occurs when data in the
two common columns is the same.

In database jargon, the field that is used as a reference from one table is called a foreign
key (Fk) and it must correspond to another field which is a primary key (Pk) in it's table. In
our example, e_Dept is a Fk in the Employee table and d_DeptNum is a Pk in the
Department table.

The thing to recognize about the result of the query above is that the only good results are the
ones where e_Dept and d_DeptNum are the same.

So, we implement the JOIN with a WHERE clause:

SELECT e_Id, e_Fname, e_Lname,


e_Dept, d_DeptNum, d_DeptName
FROM Employee, Department
WHERE e_Dept = d_DeptNum;
Let's look at more examples:

List all the timesheets, showing the employee's name and phone.
SELECT tm_Num, tm_Date, tm_EmpID,
e_Fname, e_Lname, e_Tel
FROM Employee, TS_Master
WHERE tm_EmpID = e_ID;

List all the timesheets, showing project titles, start and end dates.
SELECT td_Num, td_ProjNum, td_Hours,
p_Title, p_StartDate, p_EndDate
FROM Project, TS_Detail
WHERE td_ProjNum = p_Number;

To obtain a particular employee's timesheets, add the condition to the WHERE clause:

SELECT tm_Num, tm_Date, tm_EmpID,


e_Fname, e_Lname, e_Tel
FROM Employee, TS_Master
WHERE tm_EmpID = e_ID
AND tm_EmpID = 'A1111';

You may be able to guess from the previous examples that joining 3 or 4 tables requires that all
tables have pairs of common columns.

To obtain data from the Department, Employee and TS_Master tableswe have to know that Dept.
Number exists in both Employee and Department and that Employee ID exists in both Employee
and Timesheet Master:
SELECT tm_Num, tm_Date, tm_EmpID,
e_Fname, e_Lname, d_Name
FROM Employee, TS_Master, Department
WHERE tm_EmpID = e_ID
AND e_Dept = d_DeptNum;

To list all timesheets, with employee names and project titles, we know that Timesheet Number
exists in both Timesheet Master and Timesheet Detail, that Employee Id exists in both Employee
and Timesheet Master and finally, that Project Number exists in both Timesheet Detail and
Project:

SELECT tm_Num, tm_Date, tm_EmpID, td_ProjNum,


e_Fname, e_Lname, p_Title
FROM Employee, TS_Master, TS_Detail, Project
WHERE tm_Num = td_Num
AND tm_EmpID = e_ID
AND td_ProjNum = p_Number;

OK, so it doesn't look all that great! But it works. All you have to do is arrange the column
names and use the ORDER clause to sort it in proper order. And again, if you want to see the
timesheets relating to a particular project, modify the WHERE clause:
SELECT tm_Num, tm_Date, tm_EmpID, td_ProjNum,
e_Fname, e_Lname, p_Title
FROM Employee, TS_Master, TS_Detail, Project
WHERE tm_Num = td_Num
AND tm_EmpID = e_ID
AND td_ProjNum = p_Number
AND td_ProjNum = 'C33333';

The 'JOIN' Formula

Joining multiple tables is not difficult as long as the database is designed


properly: tables that are to be joined must have columns in common.
The formula is applied in the WHERE clause:
WHERE table1_column_w = table2_column_x AND table2_column_y =
table3_column_z AND ...
If 2 tables have no common columns they cannot be joined. For example, if we
still had the Products table in our database, we couldn't join Products and
Employee or Products and Project because there is no common data in those
tables.

The great thing about JOINS is that once you've mastered the technique you can obtain
information from anywhere in the database. It may involve 4 or 5 or 10 joins but, so what!

The Boss wants to know which Departments are involved in projects in Germany at the moment.

Follow the joins:

SELECT DISTINCT d_DeptName, td_ProjNum, p_Title, p_Country


FROM Department, Employee, TS_Master, TS_Detail, Project
WHERE td_ProjNum = p_Number
AND tm_Num = td_Num
AND tm_EmpID = e_ID
AND e_Dept = d_DeptNum
AND p_Country LIKE 'Germany*'
AND DATE( ) BETWEEN p_StartDate AND p_EndDate;

There are several points that should be noted about this query:

• If a project in Germany has many timesheets submitted on it from one department, each
occurence will generate one row - we only want to know the name of the department, not
how many times it shows up so, we use the DISTINCT clause.

• In the WHERE clause, always do the joins first - there are 5 tables involved and
therefore, there are 4 joins.

• Whenever you are comparing to a string or text field, use the LIKE operator - the country
could have been mistakenly entered as "Germany " in the project data - the strings
"Germany" and "Germany " do not match.

• The Boss said "...in Germany at the moment". Listen to the question. That means
currently active. You don't want project that are already over or that haven't started yet. If
today is between the start and end dates, the project is currently active.

• If there is an active project in Germany but it hasn't had timesheets submitted for it yet, it
won't show up in the list. There is a way to list it in a query and we'll cover that in the
next Lesson.

Lesson 8 - Specialized Joins

USING ALIASES AKA NICKNAMES

An alias is a nickname, a second name given to an object. In SQL there are a few occasions
where you may want to use aliases for tables.

Case 1: you have 2 tables with the same field names. It happens frequently and there is no
particular problem with it. We have been using different names, with prefixes, for all columns
but not everyone does that.

Suppose that you used EmpID in both the TS_Master and Employee tables. Now, when you do a
join on the tables, like this:

SELECT tm_Num, tm_Date, EmpID,


e_Fname, e_Lname, e_Tel
FROM Employee, TS_Master
WHERE EmpID = EmpID
AND EmpID = 'A1111';
you soon run into all kinds of problems and all of them will mention something
about ...ambiguous reference... which means that the system doesn't know what the heck you're
talking about - it can't figure out which EmpID you are refering to because there are 2 of them.

The solution is to add the table names, with dot notation, to the fields which are ambiguous:

SELECT tm_Num, tm_Date, TS_Master.EmpID,


e_Fname, e_Lname, e_Tel
FROM Employee, TS_Master
WHERE TS_Master.EmpID = Employee.EmpID
AND TS_Master.EmpID = 'A1111';

Now, adding Table_name. to all fields is standard SQL syntax. If you look at SQL code
generated by the Access query wizard, you will see that it is done all the time, regardless of
whether the name is ambiguous or not. But in most applications where field names are all
different you don't bother doing it because it's just too much extra work.

Which brings me to my next point. If you're a typical programmer, you will want to save every
keystroke you can. Typing table names all over the place is a pain. To avoid it, use aliases for the
tables - the alias is a name that you give the table in the FROM clause and then you use it
everywhere else in the statement:

SELECT tm_Num, tm_Date, T.EmpID,


e_Fname, e_Lname, e_Tel
FROM Employee E, TS_Master T
WHERE T.EmpID = E.EmpID
AND T.EmpID = 'A1111';

OK, so it may not be an earth-shattering improvement but it is an improvement.

The standard syntax for using an alias is:

SELECT Alias1.field, Alias2.field, .....


FROM Table_name AS Alias1, Table_name AS Alias2 ...;
The 'AS' operator is optional and most people don't use it.

Don't worry about using the alias in the SELECT clause before it's been named in the FROM
clause - that's the way it works.

Case 2: You have to join 2 tables to one primary key:

In our application, Department Head, the column d_Head, contains an employee_ID. If you want
to create a query to display both the employee's name and the department head's name, you will
have to join 2 Fk to the Employee Pk. But the rules don't allow for 2 joins to one key. The
solution is to use an alias for the Employee table - in fact, consider the Employee as if it were 2
tables, one for the employee's name and the other one for the department head's name.

SELECT tm_Num, tm_empID, E1.e_Lname, E1.e_Fname,


d_deptname, d_Head, E2.e_Lname, E2.e_fName
FROM TS_Master, Department, Employee E1, Employee E2
WHERE tm_EmpID = E1.e_ID
AND d_Head = E2.e_ID
AND E1.e_Dept = d_DeptNum;

The same technique applies when you must show an employee's name and a project leader's
name from timesheet information. In the application, project leaders are only identified in the
Project table by their employee ID. As in the previous example, you have to use an alias for the
Employee table to get both employee name and project leader name in the query.
SELECT tm_Num, tm_empID, E1.e_Lname, E1.e_Fname, td_ProjNum,
p_Title, p_Leader, E2.e_Lname, E2.e_fName
FROM TS_Master, TS_Detail, Project, Employee E1, Employee E2
WHERE tm_Num = td_Num
AND p_Number = td_ProjNum
AND tm_EmpID = E1.e_ID
AND p_Leader = E2.e_ID;

OUTER JOINS

The joins we have been doing so far have all been Inner Joins. That is the kind of join that is
done by default when you join two or more tables. It means that the only rows displayed by the
query are those where all the columns asked for by the query contain valid data. But what if
some columns are null and you still want to see them? There is another form of join for those
cases and it's called an Outer Join.

In the previous lesson we had a case where we wanted to see all the departments involved in
projects in Germany. The only way to get that is by joining all the tables involved in providing
the chain from 'Project' to 'Department'. Since all the joins are inner joins by default, only those
joins where both columns exist will be displayed.

Let's look at the example again, but simplified.

Query: List all countries for which there are timesheets.

SELECT p_Number, p_Title, p_Country, td_Num, td_Hours


FROM Project, TS_Detail
WHERE p_Number = td_ProjNum;

You get the list of all countries for which timesheets have been submitted:

However, if you want to see all countries for which there are projects, including those with no
timeshheets, you must do an outer join which, in Access, is written as LEFT JOIN because you
will see all the rows from the table named named on the left in the JOIN clause:

SELECT p_Number, p_Title, p_Country, td_Num, td_Hours


FROM Project LEFT JOIN TS_Detail
ON Project.p_Number = TS_Detail.td_ProjNum;

For some reason Access insists on having the table_name included in the ON clause.

You get the list of all countries even if timesheets have not been submitted:
Outer joins are not used a lot in SQL but they do have a few crucial applications.

If you ever have to design any kind of scheduling or reservations system, you will have to use
Outer Joins.

For example, a Doctor's office where you have an Appointments table joined to a Patients table,
the only way you can see both the booked slots and the empty slots in a query is to use an Outer
Join.

For Hotel reservations where you have to see the rooms which are occupied as well as those that
are free, you will have to Outer Join the tables Rooms and Customers.

Vous aimerez peut-être aussi