Académique Documents
Professionnel Documents
Culture Documents
Simultaneous update
Granular access
The DBMS can limit users to specific records within tables, and to specific columns
within records (or both).
Standard “API” language
SQL is considerered an “Applicacation Programming Interface”, as it is
more designed to make a request to an existing program (i.e. the DBMS).
2) Database Functionality
Create/alter/drop
o (Note the pairings of column name, data-type and (optional) constraints.
o create table bc_employee
(emp_id number(3) primary key,
emp_name char(15) not null,
emp_start date not null)
o alter table bc_employee
add(new_column varchar2(25) unique);
alter table bc_employee
modify(column_name varchar2(50));
o drop bc_employee;
Note that this would also drop any related entities, such as indices
or permissions)
o describe bc_employee
3
Sequences: Purpose, mechanism
Sequences are essentially key generators. They return a unique number on
request. The number is very often used as the “dumb” key.
Views: dynamic “partial” tables
Views are definitions of answer sets that provide granular access or hide complex
logic. Views are logical filters on tables which allow us to limit access to the table
to specific rows and columns (granular access).
If a column name exists in more than one of the tables, it must be “qualified”
when used. This makes the script more readable.
Outer Join
Indicates right-hand is the “master table” and its all rows should be included,
whether or not they have a matching row in the left table.
Subqueries
Subqueries can be used to provide a needed calculation before the WHERE clause
evaluate a record.
Independent Subquery
return a constant that will be used to filter all records.
Dependent Subquery
generate a distinct value for each row in the main query, based on FROM tables.
They are run multiple times, so that each row in the main query is tested against
data relevant to it.
Union
Union clauses allows us to run a query that has two, or more, completely
independent selects and answer sets. Restriction is that the answer sets from the
various queries must be consistent. Union implies a “distinct” function, so any
duplicate rows will be eliminated from the answer set.
Data-type processing
String and number functions
o to_char(emp_salary, ‘$999,999.00’)
o WHERE to_char(in_date, ‘Dy’) = ‘Fri’
4
Final Exam Study Guide
6) Procedural Logic (Final exam focus: concepts - not syntax)
Logic Basics. Procedure have 4 functional aspects.
1) Inputs
defining the “argument” or required inputs/outputs the procedure
needs to run.
2) Flow control
Branching and iteration logic, e.g. if-then-else
3) Assignments (variables)
The ability to store information for internal use, such as saving the data
from one query to use in another.
4) Exception handling
The ability to react to errors and unexpected conditions.
2) Example
Foreign key constraints ensure that a value exists in another table before it
can be entered in the referencing table.
A customer reference on a purchase order may not only want the customer
to exist, but be active and have a good credit rating.
Procedures
Procedures are very efficient, both in operation (the client can get a lot of work
done with one call to the database) and in maintenance (the application logic can
change without changing complex web programming logic)
Command-driven
Advantages
a. Force users to use proper sequence of SQL commands to accomplish a
business transaction.
b. Insulate from needing to react to changes in the database.
c. Performance gains and developer ease.
d. Stronger likelihood that designer of the application database will build a
better sequence of commands.
Arguments
5
o Variable-name, “in” or “out” or “inout”, datatype
6
accomplished with no need for any user to change their local
SQL code.
Remote DB links
Remote DB links allows us to access tables on other database servers
as if they were local to prevent nightmare to copy and maintain data
wherever it is needed.
There is a “proxy” user which logs on the to remote database
instance and provides whatever access it owns to the local users.
Occasional updates are better handled via remote links than MV.
Negative aspects
constant remote access can involve network traffic.
Network connection might not be available.
Replication (Materialized views)
Materialized views are tables that exist on a “local” database, which
are copies of a “master” table in another database.
The DBMS manages the constant update of the tables to keep them
equal.
Updates can be performed by sending a specialized “redo log”
to the other database server and having it apply the change.
Materialized views are best when ‘centralized’ information is shared
in ‘read-only’ mode
E-T-L
Purpose: to move data from operational to data warehouse
environments.
ETL stands for Extract, Transform and Load, which is a process used
to extract data from various sources, transform the data depending on
business rules/needs and load the data into a destination database.
Extraction
The difference in records, “delta”, are determined and a series
of update commands are generated for the warehouse.
Extraction can have issues with consistency/timing.
7
a) If the operational data is “effective dated”, all records
with an effective data after the last extraction can be
selected.
Transformation
Involves changes to make the data more useful in analytical
rather than operational operations.
3 major considerations
a) “de-normalization” to add foreign attributes to data for
ease of reporting. (assume no further update)
b) standardization – data must be consistent across
multiple sources.
c) Effective dating – to keep multiple copies of historical
information in such a way that we know which version
was relevant at any point of time.
i. A new value does not replace the old value, but
“retires it” and exists with it.
Loading
Simply loading of data in the warehouse.
Formatting differences
o Unstructured data needs to become structured data, like word docs or
excel into SQL.
Analytics
Categories (Models are broken into 3 groups)
Predictive – searches for likelihood of a specific behavior,
based on attribute values.
Clustering – searches for overall similarities, based on attribute
values.
Association – searches for events that tend to occur together,
based on occurrences of an event.
Predictive concepts (3 factors in evaluating the model’s output)
Accuracy / Confidence – likelihood of B given A.
Coverage / Strength – likelihood of A in the whole population.
Interest – Difference between accuracy of entire population and
accuracy of a specific subset.