Vous êtes sur la page 1sur 8

ISYS3257

Midterm Study Guide

1) Services provided by DBMS’s


Single point of access
 To ensure consistent application of technical and business rules and access.
Data Integrity – all data / business rules applied consistently

Recovery (forward and backward)


 1) “roll forward” logging – allows the database to recover from an error by utilizing a
backup copy of the data folder, and reapplying all changes from a log.
 2) “roll back” recovery – if a business function that has multiple database changes fails
in the middle, refers back way it was before transaction started.

Simultaneous update
Granular access
 The DBMS can limit users to specific records within tables, and to specific columns
within records (or both).
Standard “API” language
 SQL is considerered an “Applicacation Programming Interface”, as it is
more designed to make a request to an existing program (i.e. the DBMS).

2) Database Functionality

Configuration: Tiers (server, instance, schema, table, tablespace)


 Server – a computer that has the DBMS software installed.
 Instance – a running dbms program. Servers might run several instances at the
same time (e.g. a “production” version and a “test” version).
 Schema – logical collection of tables.
 Table – the actual data storage structures.
 Tablespace – a storage location where the actual data underlying database objects
can be kept

Locking: Sharing data


 The lock manager ensures that all changes made are committed before another
user can start updating any record. Locking prevents errors caused by 2 users
trying to change the same data, as well as resolving “deadlocks” where several
users are “frozen” because neither can proceed without gaining access to a record
the other has.

Logging / Journalling / Committing Changes


 Logging – which keeps the “before image” of records being changed in a
transaction, ensures that if a transaction fails in the middle, make the database as
it was before the transaction started. “UNDO LOG”
 Journaling – Permanent changes to the database are redundantly stores in a
sequential journal. If any disaster strike, the database can be restored to the
minute by using a combination of backup “images” and the journal. “REDO
LOG”
 Committing Changes : related updates are committed all at once to the database.
1
Catalog / Data Dictionary
 Catalog is a set of views that how metadata that descrbies the objects in an
instance of SQL Server. Metadata is data that descrbies the attribute of objects in
a system.
 Data Dictionary is a set of database tables used to store information about a
database’s definition. Data Dictionary is used by SQL server to execute queries
and is automatically updated whenever objects are added, removed, or changed
within the database.
Constraints
 Constraints are used to specify rules(limit) on the data in a table.
o UNIQUE, NOT NULL, PRIMARY KEY(combination of two)

3) Database Application Design / Development


Entities – each entity describes a resource or action in our application and must have a
common value with at least one other entity to relate them.
Relationships / Cardinality
 Entity-Relationship Modeling (“E-R Modeling”)
o This is a design process that breaks data elements into the major
entities(resources and actions) required by an application and describes the
relationships bet ween them.
o The goal is to maintain each piece of information exactly once, and link to
it whenever it is needed.
 Cardinality
o Is an attribute of the relationship that notes how many of one entity can be
linked to how many of the other.
Normalization
 is a design process that ensures the various data elements are arranged in a way
that will not hinder relational algebra.
Normalization Rules – are intended to ensure that the groupings will allow relational
logic to operate properly.
 1) No duplicate entries(fields or arrays)
o These should be stored in a separate table(result from many-many
cardinality)
 2) If the table’s unique key is concatenated, all other fields must relate to both.
 3) All attributes relate to the entity (all fields in table must describe key)
Common Values
 Relational algebra implements relations by shared values
 There must be common value to allow two entities to be related.
Best Practices (keys, naming)
 Intelligent and “Dumb” keys
o Entities require “primary key” fields that act as the primary identifier.
o Key fields must be unique and must always have data.
o Key should be free of “logic” (dumb), so that no change in business rules
or status would ever cause it to be changed.
 Column Naming
o Use a different column name to indicate the reason it is being included in
that table. (ex. proj_mgr from emp_id)
 Effective Dating
o A way of updating data such that the historical values are not lost.
Whenever a record changes, the existing data is marked as “retired”.
2
4) SQL: Data Definition
Data Types: text, numeric, dates
 Character / Text:
o “Char(4)” indicate a fixed length,
o “varchar2(50)” indicate a variable length with a maximum size of 50.
 Numeric:
o Type “NUMBER” has a fixed number of decimal places, which you can
specify.
o Ex) “number (7,2)” would define a numeric field with a total of 7 digits, 2
of them being decimal places.
 Date / Time:
o “DATE” and “TIMESTAMP” store a valid date and time as a “time-
stamp”, which can be returned in any of several formats.
 Allows us to compare date values “1/3/54” with “3-Jan, 1954”.
Tables: Columns, constraints
Create and Alter
 Constraints defined in the table definition include:
o “unique” – the value in a column cannot be duplicated in another row of
the table.
o “not null” – a record must have a valid value in the column
o “primary key” – a combination of unique and not null.
 Only one primary key per table, though it can contain more than
one column.
o “references” – a limit on the domain of values in a column, they must
occur in the table.column noted in the constraint.
o “check”- a comparison of the data in a column to literal data or other
columns in the same row.
 Tables are 2-dimensional structures essentially made up of columns.
 Column definitions have three parts:
1) Column name
 Column names must be unique within the table and cannot be entirely
numeric. Avoid name “table”, “user”, “desc”.
2) Data type
3) Constraints (optional)

 Create/alter/drop
o (Note the pairings of column name, data-type and (optional) constraints.
o create table bc_employee
(emp_id number(3) primary key,
emp_name char(15) not null,
emp_start date not null)
o alter table bc_employee
add(new_column varchar2(25) unique);
alter table bc_employee
modify(column_name varchar2(50));
o drop bc_employee;
 Note that this would also drop any related entities, such as indices
or permissions)
o describe bc_employee
3
Sequences: Purpose, mechanism
 Sequences are essentially key generators. They return a unique number on
request. The number is very often used as the “dumb” key.
Views: dynamic “partial” tables
 Views are definitions of answer sets that provide granular access or hide complex
logic. Views are logical filters on tables which allow us to limit access to the table
to specific rows and columns (granular access).

5) SQL Data Manipulation


insert: generates a single record in single table.
INSERT INTO Statement is used to add new rows of data to a table in the database.

INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)]


VALUES (value1, value2, value3,...valueN);

update: update existing rows in table


UPDATE Customers
SET City='Hamburg'
WHERE CustomerID=1;

delete: DELETE statement is used to delete rows in a table.


DELETE FROM table_name
WHERE some_column=some_value;

 If a column name exists in more than one of the tables, it must be “qualified”
when used. This makes the script more readable.
Outer Join
 Indicates right-hand is the “master table” and its all rows should be included,
whether or not they have a matching row in the left table.
Subqueries
 Subqueries can be used to provide a needed calculation before the WHERE clause
evaluate a record.
Independent Subquery
 return a constant that will be used to filter all records.
Dependent Subquery
 generate a distinct value for each row in the main query, based on FROM tables.
They are run multiple times, so that each row in the main query is tested against
data relevant to it.
Union
 Union clauses allows us to run a query that has two, or more, completely
independent selects and answer sets. Restriction is that the answer sets from the
various queries must be consistent. Union implies a “distinct” function, so any
duplicate rows will be eliminated from the answer set.
Data-type processing
 String and number functions
o to_char(emp_salary, ‘$999,999.00’)
o WHERE to_char(in_date, ‘Dy’) = ‘Fri’

4
Final Exam Study Guide
6) Procedural Logic (Final exam focus: concepts - not syntax)
Logic Basics. Procedure have 4 functional aspects.
1) Inputs
 defining the “argument” or required inputs/outputs the procedure
needs to run.
2) Flow control
 Branching and iteration logic, e.g. if-then-else
3) Assignments (variables)
 The ability to store information for internal use, such as saving the data
from one query to use in another.
4) Exception handling
 The ability to react to errors and unexpected conditions.

Event-Driven: Triggers (“complex constraint”)


1) Triggers are procedures that are automatically executed by the DBMS when
certain SQL actions occur (“Event-driven”)
a. Purpose: by using the trigger mechanism, the DBA can ensure that the
procedure is executed every time the event occurs, rather than trusting
a user to remember to run the procedure.
b. “Before” triggers execute for each row and “After” triggers execute for
each command.
c. Trigger has access to both the old and new versions by using qualifiers
“:new” or “:old”
i. Only “once per row” can access the columns in the record
being changed using the “:new” or “:old” qualifier. “once per
command” triggers cannot since there might be multiple
records being inserted/updated.

2) Example
 Foreign key constraints ensure that a value exists in another table before it
can be entered in the referencing table.
 A customer reference on a purchase order may not only want the customer
to exist, but be active and have a good credit rating.

Procedures
 Procedures are very efficient, both in operation (the client can get a lot of work
done with one call to the database) and in maintenance (the application logic can
change without changing complex web programming logic)
 Command-driven
 Advantages
a. Force users to use proper sequence of SQL commands to accomplish a
business transaction.
b. Insulate from needing to react to changes in the database.
c. Performance gains and developer ease.
d. Stronger likelihood that designer of the application database will build a
better sequence of commands.
 Arguments
5
o Variable-name, “in” or “out” or “inout”, datatype

7) Access control / Security (Final exam focus)


 Views
o Views are logical filters on tables which allow us to limit a person’s
access to a table.
 Roles
o Let us group people and privileges for ease of management. Roles are
critical to well-managed access control.
 Grant privileges (Grant access)
o Owner has all privileges on his objects. Other users have no privileges
until they are granted.
o 1) System privileges focus on user’s ability to create and manage new
resources.
 E.g. “log on”, “create tables”, “create procedures”
o 2) Resource privileges focus on the user’s ability to utilize existing
resources.
 E.g. “update”, “delete”, “insert”, “select”, “alter”
o Procedures also have “execute” access, which allows the holder to run
the procedure. With “execute” access, no need any grants on the tables
used by the procedure.
o “with grant” allows the user to “pass on” the privilege to another user.
 Row-level security
o give users limited access to rows in a table.
 To maintain appropriate security.
o We can use views to provide such limited access to a table.
 But if there are many discrete access needs, and constantly
changing data, management of such views can be impossible.
o 2 steps for mechanism of row-level security
 1) create a row-level access table.
 2) create a single view which joins the main (student) table to
the access table (on student ID) and includes in the WHERE
clause a match between faculty ID and the user’s username.
o The view will provide each advisor access to only the students for
whom he has a matching entry in the access table.
o Oracle also provides “Virtual Private Database” that acts as a trigger
on queries by a user.

8) Accessing remote data (Final exam focus)


 Service ID (SID)
o Oracle maintain a “SID” for every database instance for which it needs
to connect. SID is used to uniquely identify every instance in a
network.
 Synonyms
o Synonyms are resources (tables or views) that make them more
readily. Mechanism used with synonyms can make the data appear to
be local to the user.
o Advantage
 We can move a table to a new schema, change the synonym to
point to the new version of the table, and the change is

6
accomplished with no need for any user to change their local
SQL code.
 Remote DB links
 Remote DB links allows us to access tables on other database servers
as if they were local to prevent nightmare to copy and maintain data
wherever it is needed.
 There is a “proxy” user which logs on the to remote database
instance and provides whatever access it owns to the local users.
 Occasional updates are better handled via remote links than MV.
 Negative aspects
 constant remote access can involve network traffic.
 Network connection might not be available.
 Replication (Materialized views)
 Materialized views are tables that exist on a “local” database, which
are copies of a “master” table in another database.
 The DBMS manages the constant update of the tables to keep them
equal.
 Updates can be performed by sending a specialized “redo log”
to the other database server and having it apply the change.
 Materialized views are best when ‘centralized’ information is shared
in ‘read-only’ mode

9) Data Dictionary (Background only – not tested)


 Information tracked
o Data Dictionary is used by SQL server to execute queries and is
automatically updated whenever objects are added, removed.
 Use with business schemas

10) Decision Support (Final exam focus)


 Operational Database
o Designed to allow many users simultaneous update access.
o Leads to small tables(save memory), normalization(so updates occur
in only one place) and brief locking.
o Only currently active data is included to keep the system nimble.
 Analysis Process
o Utilize massive amounts of data.
o “read-only” and can run for a long time.

 E-T-L
 Purpose: to move data from operational to data warehouse
environments.
 ETL stands for Extract, Transform and Load, which is a process used
to extract data from various sources, transform the data depending on
business rules/needs and load the data into a destination database.
 Extraction
 The difference in records, “delta”, are determined and a series
of update commands are generated for the warehouse.
 Extraction can have issues with consistency/timing.

7
a) If the operational data is “effective dated”, all records
with an effective data after the last extraction can be
selected.
 Transformation
 Involves changes to make the data more useful in analytical
rather than operational operations.
 3 major considerations
a) “de-normalization” to add foreign attributes to data for
ease of reporting. (assume no further update)
b) standardization – data must be consistent across
multiple sources.
c) Effective dating – to keep multiple copies of historical
information in such a way that we know which version
was relevant at any point of time.
i. A new value does not replace the old value, but
“retires it” and exists with it.
 Loading
 Simply loading of data in the warehouse.
 Formatting differences
o Unstructured data needs to become structured data, like word docs or
excel into SQL.
 Analytics
 Categories (Models are broken into 3 groups)
 Predictive – searches for likelihood of a specific behavior,
based on attribute values.
 Clustering – searches for overall similarities, based on attribute
values.
 Association – searches for events that tend to occur together,
based on occurrences of an event.
 Predictive concepts (3 factors in evaluating the model’s output)
 Accuracy / Confidence – likelihood of B given A.
 Coverage / Strength – likelihood of A in the whole population.
 Interest – Difference between accuracy of entire population and
accuracy of a specific subset.

Vous aimerez peut-être aussi