Vous êtes sur la page 1sur 2

Recovery: Transaction failures:​ Logic errors, the situation when transaction attempts to maintain not represent its adversary

t its adversary (adversaries). In short, if a potential client needs representation in a legal conflict
normal execution but can not. Example include: bad input, data absence, overflow in numerical computation, with some entity that has the company on retainer, then there is a potential conflict of interest. Past legal
or other logical problems. ​System error​ when transaction can not continue(deadlock occurred). In such cases may contain information affecting current ones. Access to such information may be illegal. Should some
circumstances partially executed transaction may have to be rolled back, restarted and re-executed. ​System form of Chinese Wall be instituted there as well? ​Defining security: ​ ​“C”​ - for Confidentiality (information is
failures:​ System crashes: There maybe hardware malfunction, a bug in DBMS. OS malfunction and crash Each only provided to authorized users). ​“I”​ - for Integrity (information is not altered by unauthorized users) ​“A”​ -
of these may result in loss of information stored in volatile storage. In principle, in such situation the for Availability (information is provided for properly identified authorized users). ​Issue of granularity: ​The
non-volatile memory should stay intact. This assumption is called ​fail-safe assumption​. ​Non-volatile storage issue of granularity is very important in business applications. example, the personal records of employees can
failure:​ The disk is an electronic or mechanical-electronic device. The reading arm may fail, the device may be viewed by all the customer service representatives in HR department. Such representatives should be able
become corrupted. Flash disk may be destroyed. There are various schemes (using mirroring, error-correcting, to change, say, employee’s phone number or address. But changing the field SSN must be seriously restricted,
and other techniques) to handle such problems.​needs to be done for recovery: ​We need to: collect an and likewise the salary field. In business applications we want flexibility. This is usually called ​Discretionary
adequate info to be able to “repair” database if there is a need to do so, be able to manage the info that is Access Control​ and is implemented by all major SQL-based DBMSes. In other applications we may want a
collected, make sure we can get this info when it is needed, be able to tell what went wrong, pinpoint the tight, centralized control. This is usually called ​Mandatory Access Control ​and is not part of SQL standard. We
moment of crash, be able to reconstruct the consistent state of DB, ensure durability, even in face of repeated may want to protect secrets (this is called ​mandatory access control ​or Bell-La Padula model). We may want
failures.​DBMS ORGANIZATION:​ When the DBMS starts a number of buffers in main memory are assigned to to make the information more authoritative. (this is called ​mandatory integrity control ​orBiba model). There
DBMS. The size of such buffer is (roughly) same as the size of the block. When pages are copied for processing are other models of security, for instance Chinese Wall (also called ​Brewer-Nash model​). To support the
items that are in them, they are copied into buffers. A transaction, gets its own workspace. The data needed “need-to-know”​ paradigm (you get data needed for doing your job but not more) SQL provides the notion of a
by a transaction is copied from the buffer to that workspace. Eventually a committed transaction copies the view​. Views are like tables, but they live ‘virtually’, we store their definitions, not their extents. Views are
data items it writes from its workspace to the buffer. Eventually the data is copied from the buffer to the disk often sufficient to control access to information. There are problems with updating views, some issues are
or other stable memory used by DBMS.​Support recovery: ​We need to keep the data in two places (i.e. each related to view definitions (Ex: views using multiple tables), some to their semantics. The ​MAC model of
block is kept in two places). When we output a block from main memory to the disk: We write the information security​ is based on two basic axioms: Any user with classification t can write only to objects o that have
to the first physical block (copy). Only when this succeeds, we write it to the second physical copy. The write classification t0 such that t <= t0. 2. Any user with classification t can read only objects o that have
operation succeeds only when both copies are successfully placed (copied). The point of this is that we have at classification t0 such that t0<= t. This is called “No write down”, and ​“No read up”. ​Sometimes an important
least one copy not corrupted in any moment of time. So if we see that the copies are different then we know variation, called strong property is adopted. This is a variation where users can write only the information of
that one is correct the other one needs to be replaced. ​Transactions: ​Transactions “work” in their workspace. their own level (but no write down or strictly up). This property ensures that the user can check what they
Given a transaction T that uses data item X, we copy the content of X from the buffer to the variable xT wrote. But reading “down information” is still possible. ​Tranquality: ​This means that the classification of an
residing in the workspace of T. This is done with the command read(X). If the block BX where X resides is not object does not change during normal operation. This protects the subjects from inadvertent incorrect
yet in the main memory, the system issues input(BX ). the variable xT is assigned the value of X in the reading/writing. (There are subtler aspects of tranquility related to need to know: no one should have an
corresponding buffer. When the transaction T issues write(X) this results in copying of xT to the buffer where X access to information, until s/he needs it to do what s/he is supposed to do). prevent ​Trojan Horse attack
is (and overwriting it). If the block BX where X resides is not yet in the main memory, the system issues because, say marek has classification SECRET, and so his table, clients has the classification SECRET. Now, let
input(BX). copy the contents of xT to X (in BX). ​Main idea of recovery: ​before we do anything at all to database us assume that eve has classification strictly smaller than SECRET (because otherwise she would have the
we store in stable storage the information describing the changes we intend to perform to the copy of the access to the table clients, but she does not). So, eve has classification UNCLASSIFIED or CONFIDENTIAL. When
database that resides on disk. Only after the information about planned changes is safely saved, we execute eve (or any program owned by her) attempts to read the table clients the request is rejected and Security
these changes. ​Log:​ The log consists of a list of log records. The main kind of records on log are update log Officer informed.This phenomenon of different users seeing different answers to the same query is called
records. But there are other records, too: start of transaction, commit of transaction, and abort of transaction. polyinstantiation. Compartments: ​This gives us an additional mechanism for access control. The users are
The log require periodic shortening. Any write operation must result in creating an log update record.Once marked with compartment bit. Similarly, documents (objects) are marked with compartment bit. The user
that record is in the log, we must make sure that the part of log containing that record is in stable storage marek can access a table which has compartment share_pricing if his bit corresponding to compartment
before the update is executed.​Types of modification: ​When a transaction waits with its modifications until it share_pricing is 1​, otherwise he cannot access this table. This mechanism is neutral w.r.t. security model.
commits we talk about deferred modification. When the change may occur when the transaction is There may be many ​compartments​, each with its separate marking bit. ​Biba Model:​ The idea is that your
active(before it commits)we talk about immediate modifications. Deferred modifications have an overhead. a authority (position within the business hierarchy) determines what is being considered authoritative data. For
transaction needs to keep local copies of the modified data. With long transactions it may be costly in terms of instance the dean dexter represents more authority than the departmental chair alice. As they create
space. But the same record may be used more than once without the need to access the buffer. “content”, this content is considered authoritative in their part of the organization, that is among people of
Checkpointing: ​The log reflects every writing operation of DBMS The log may grow very large. Thus the corresponding organizations. Specifically, dexter writes authoritatively to all people in the college, while
redoing/undoing all transactions that are described by the log records may be very laborious. Moreover, alice can only write authoritatively to people in the department. ​RBAC: ​In principle RBAC is neutral w.r.t. other
redoing transactions that are already properly executed and the results pushed to the disk is a waste of access policies. We can have RBAC for DAC and RBAC for MAC. A standard for RBAC was set by US NIST. The
resources and degrades the performance. Thus the recovery will take longer as the crushes occur.At some SQL standards, starting with SQL-1999 support RBAC. That is GRANT command allows for granting privileges to
point of time we make sure we know all transactions that are active. The remaining transactions will be role and not only to individual users.tools are needed to check: Which users are in a role? Which roles a user
“closed”. While checkpointing we do not allow for new updates. We make sure that during the checkpointing belongs to? Which privileges a role possesses? What roles have a given privilege? How are roles related
all modified buffer pages are force-output on disk. steps: output the log to stable memory. output to stable (inclusion, disjointness)? ​Chinese Wall Policy: ​Used in conflict-of-interest situations. CW policy is sometimes
storage all ‘dirty’ pages. After the success of this last operation we output to the stable storage an additional called Brewer-Nash policy. The idea is that besides the objects (like in B-LP) we also have data sets, collections
log record hckpt Li where L is the list of all transactions that are active at this time. we store the list L is that it of objects. We assume that data sets form a partition of objects. Then, we have a conflict relationship on the
allows us to compute a point on log where it is guaranteed that update log records before that point are collection of data sets. This relation is supposed to be irreflexive and symmetric. ​Rules: ​Access is granted to
properly reflected on the disk. This is the point where the earliest of transactions in L start. That point of log subject s on object o if either o is in a data set which s already accessed, or o is in a data set which is not in
has the property that we can safely eliminate all log update records preceding it on the log. with the list L we conflict with any of data sets already accessed by s. Observe that there is no difference for reading or writing.
can locate that point where redoing starts and undoing ends. Let us call it pL. after checkpointing is finished The reason for this is that the user may read something that she should not, or write something she should
new transactions come. Therefore the collection of transactions that need to be considered in the recovery not! Example: A large law office currently represents ACME Corp against First Bank. In the past the office
consists of L and all the transactions that started after the last checkpoint. ​Remote Backup System: represented First Bank. Associates that represent ACME cannot see files to the past representation of First
Centralized DBMSes and client-server architecture DBMSes (i.e. with a central database server) are vulnerable Bank.One solution is making sure that the both sides do not access the same information. ​How are policies
in case of catastrophic events Such events can be fire, water damage, earthquake and serious mechanical enforced: ​On DAC systems, generally, by the DB server. The files used are system files that are stored on the
failures. For that reason critical applications attempt to keep an additional site(s) for backup and high server but are especially protected. In MAC, a separate security server is used to enforce the policy. It may be
availability. There is a variety of such backup architectures. 1. Company-owned 2. Specialized the same server as the DB server, or not. With MIC (Biba) policy again we should have a separate server.
companies-owned (well-known specialized companies are: Remote Backup Systems, Mozy, Barracuda and Similar issues arise with CW policy. RBAC is neutral w.r.t. other policies. ​SQL injection: ​SQL command : SELECT
others). 3. There are various business models of such backups and now it meshes with Cloud Computing (i.e. FROM accounts WHERE lname = `" + l + "`; If we assign to variable l the string marek, the records of all
backup being done in Cloud). ​The idea​ is that the secondary site mirrors both the primary site and the log. The accounts with the value of lname:marek will be returned.when we assign to lname the string `OR `2` = `2. SQL
issue is the synchronization - making sure both sites have the same data (data now includes the log). This can command: "SELECT * FROM accounts WHERE lname = “ OR ‘2‘ = ‘2‘". The WHERE clause is evaluated as True
not be absolutely assured and additional factors such as network failures need to be considered. The (since 2 = 2 is a true statement, and blank is evaluated as NULL) and NULL OR True is True, and the entire table
secondary site is geographically remote from the primary site (for instance primary site in NYC, secondary site is returned. This may be used for attacking remote databases where the authentication is poorly programmed.
in NJ). ​When the primary​ site is down, the secondary site takes over. The secondary system must run recovery Defending against SQL: ​Observe that if we keep the data in the table users encrypted, encrypt the inputs and
because its copy of the database may be outdated. Afterwards the secondary site provides the normal then check, this specific attack will fail, because the result will not coincide with what is in database. check the
services. ​We can use​ the computing capabilities of the secondary site to continually make remote site more types of inputs. We can check for strings like DROP or DELETE. All data needs to be sanitized of any string that
similar to the primary one, i.e. redo transactions in log without waiting for copies of pages in the primary site. should not be part of the input. Never use string concatenation in SQL. Prepared statements should be used as
We can perform additional checkpointing at the secondary Site. All these additional actions are called much as possible. Application login should be implemented within well-validated stored procedure. Encrypt, if
hot-spare. ​Since the remote site must, eventually, be made “current”, every transaction must be done there the application is critical. The ​fundamental principle ​of access to databases is that There should be no
as well, we can not consider a transaction T committed until the record <T commit> is on the copy of log in unauthenticated access. This is sometimes called C2 security. ​Audit: ​A basic requirement is that there must be
secondary site. But this slows the system down. For that reason remote backup systems consider different audit trail for login to and logout from the DBMS. Generally, maintained as a table with time, name of the
levels of ​durability​. ​One-safe​ Transaction T is committed and the record <T commit> is on the log in the user, sessionID, and which database is being used. It is easy to implement such audit trail with triggers, but
primary site. ​Two-very-safe​ T is committed, both sides are active, and both copies of log contain the record <T specialized add-on’s for DBMSes are available. We may want to know how much each user uses the database,
commit> ​Two-safe​. When both sites are active it is the same requirement on the log as in Two-very-safe. If in terms of number of sessions, of space used, of programs that are being executed. Sometimes we want to
only primary site is active we proceed as in One-safe, that is we require that <T commit> is written to the copy know data-definition activity (like: did a new table or view was defined?) The point is that such activity may be
of log kept on primary site. not permitted by law (say, privacy laws) For instance, in various countries of Europe customer data can be
Security: ​Within an organization there may quite often be different categories of users. For instance, in kept maximum of 6 months, so copying of that data into a new table may lead to a violation of local laws.
financial industry, one often needs to institute the ​Chinese wall​ between analysts that work for the company Accounts w/o passwords need to be purged. Quality of passwords must be audited. Creation of the accounts
itself, and analysts providing services to clients. Thus there are often natural ​“horizontal”​ partitions of people needs to be audited. Database errors need often be audited (because they, for instance, can indicate presence
in this type of business. But there are also ​“vertical”​ partitions; some people may have more information than of SQL injection attacks). Since triggers and stored procedures may do unexpected things, changes to such
others, or even incomparable amounts of information. ​Legal offices are​, often, retained for representation of procedures may need to be audited. Changes to privileges may need to be audited, because such changes may
companies, even ones that have their own legal department. When a large legal practice represents a client it indicate presence of the attack. Changes to roles may need to be audited because they may result in integrity
may run into ​conflict of interest​ issues. If such legal office represents a client the individuals involved should
loss. Data that is deemed sensitive should be audited. SELECT statements may need to be audited for privacy.
Dormant accounts are dangerous and so we need to audit them.
DB18​: ​ stable storage:​ Once a write procedure writes (stores) the information to the medium , the
medium must be capable of reading back that same information, instantly, and without any errors.​ databases
as storage abstraction:​ Database theory and practice are based on abstract treatment of stable storage where
implementation details are hidden from the user. ​Database Management System​ is such abstraction.
Relational DBMS (Codd,1970)- embedding databases​ into logic. We all have a built-in understanding of logic
in its simplest form.​ ​Relational concepts with its reference to a simple data structure (a rectangular table with
the first row containing metadata) are very intuitive. The declarative description of queries (“Here is what I
want”) can be easily compiled into a procedural description (“Here is how I want this to be done”) via
so-called Relational Algebra​. This compilation results in a ​query plan​. This last step (query plan) can be
optimized. This results in a very fast operation of DBMS once the data is in the main memory. Also, techniques
for storage of DB information were improved (B+ trees). Indexing (actually developed before RDBMSes)
speeds up retrieval. Multiple indexes have disadvantages too. An index is created only on field that are used
often. “Pure” relational RDBM turned out to be impractical. Generally, Relational Model treats descriptors
(values of attributes) atomically (i.e. they have no further structure). But this is not true; for instance strings
have internal structure. Also time has structure. We cannot forfeit regular expressions in processing.SQL
includes all​ three basic database languages​: Data definition language (DDL, “What is to be stored”), Data
manipulation language (DML, “How is data changed when updates are made”, “How do we insert data”, “How
do we delete data”) and ​Query language ​(QL, “How do we ask for what we want out of DB”). ​DDL​ deals with
metadata that is data about schemes. DML deals with data manipulation.​ QL​ describes queries.​Impedance
Mismatch: ​Relational databases return answers to queries as tables. Thus, in effect, RDBMS returns an
ordered list of simple objects (records), not just one object. This incompatibility of points of view may require
further processing of tables, specifically passing through all rows.This phenomenon is called impedance
mismatch. Impedance mismatch is solved by establishing a cursor which traverses the table row-by-row. Both
JAVA and PYTHON provide support for handling IM.Both JAVA and PYTHON provide support for handling IM.
Effects of cheaper disks: ​When less expensive (but also less reliable) disks appeared database community
reacted with the RAID standards. RAID: Redundant Arrays of Inexpensive Disks. The idea is to replicate data to
increase availability. Important RAID levels are: RAID 1 and RAID 5.data is “mirrored” (duplicated on two or
more disks) and striped (i.e. partitioned on different disks on byte level, possibly with checksums.Also
error-correcting codes (i.e. multiple checksums ) used. There are many RAID “levels” including one for “hot
swap”. ​Data decomposition - saving on storage: ​The idea is that one stores less, computes more
(reconstructing original tables via joins). Important class of normalized databases: Third normal form
databases.A table T is in 3NF if for each of its maintained functional dependencies X ! Y: X contains Y, or X is a
superkey, or Each element of X n Y belongs to some candidate key. ​Federation:​ Data often comes from several
sources.For police: there is a database of fugitives, database of stolen cars, database of individuals with
multiple convictions etc. But this data may be needed together - think police in a cruiser notices a car with the
plates that is in the database of stolen cars. They may want to see if the driver is a fugitive. This situation,
when data from different sources may be presented together is usually called federated database. The
software handling federated database is called​ mediator or data integrator. ​F​ederated DBMSes must handle
variety of issues:​ Metadata managing (for instance incompatibility of names and of data types), Precision
conflicts (esp. with numerical data) & Data naming problems. Examples of applications: Medical Information
systems, Geographic Information Systems, Law enforcement Information Systems. ​federation in Law
Enforcement: ​Public data on: Vehicle information, Information on bankruptcies, Information on arrests,
Information on convictions, Information on civil judgements, Alerts information.Data collected by federal,
state, local, tribal, territorial law enforcement organizations. Federal departments and agencies (DOJ, DHS,
FBI, DEA, and ATF) collect information and make it available to state and other law enforcement
organizations.​Active component: ​Normally, databases are passive, i.e. the user initiates database processing.
But modern databases often possess active component. Events (such as specific transactions) that satisfy
conditions trigger various actions. For instance user marek tries to take more than $1000.00 from his account -
we roll back this transaction. Another example (also from banking industry)overdraft handling. In medical
applications: marek is late for his annual check-up. ​the world changed: ​The Web changed many things. Data
became global. There was, suddenly, need for processing much bigger data. Huge number of databases
became accessible from far away. E-commerce emerge. There was need for searching very large and poorly
structured repositories of information - for instance collections of Web pages. These searches could not really
be done by means of RDBMSes of the kind used by banks, etc. The issue is not the speed of the search (in fact
once the data is in RDBMS and in the processor’s main memory the processing is lightingly fast) but the sheer
volume of data (and its variability). The problem was (and is) how are databases supporting this changing
world. What happens on Web?Data is globalized, often internationalized.Documents (i.e. data items) may
have very different structure. Even when data is uniform it can be huge ( data that comes from geophysical
satellites, meteorological data, network operations data, stock market data). Relatively small operations (our
University IT) generate very large data - so big that it is not possible to store it (network traffic data).​Internet
companie​s: these companies had to innovate. Google originally led the way with their Big Table storage, and
their​ Map - Reduce framework​. This was eventually published by Google and reimplemented open-sourced by
Apache​ in the form of Hadoop, and its offspring such as Hadoop2, and Spark. Other “big players” on Internet
included Apple, Facebook, Twitter, LinkedIn and other companies. For instance Facebook, originally,
introduced noSQL database system Cassandra, which was then open sourced.The big player companies
created big Data Centers to support their work. These centers not only serve companies themselves but also
often offer various services (usually having a string aaS in the name): ​ Storage as a Service (SaaS),
Infrastructure as a Service (IaaS) (and other services).​ Example: Elastic Compute Cloud (EC2)​ by Amazon
providing virtual private cloud. After abandoning RDBMS, eventually companies such as Google and Facebook
created novel architectures that use reliable RDBMSes as elements of their storage systems. For instance, it is
known that Facebook uses two-tier storage with RDBMSes (actually MySQL or like) as the lower tier and
in-memory databases as the upper tier.​Big data:​ Big Data is a term for dealing with large collections of data.
Often not structured. ​Analytics: ​The class of software handling Big Data is called Analytics. Example: Big cereal
company watches sales of its products to see where they need to spend their advertisement money. Social
networking ​companies such as Facebook,Twitter, LinkedIn, and others need a different kind of databases,
namely those that handle networks described by various types of ​graphs​. Relational databases are not good at
processing graphs. join processing is costly. other ways to process graphs: ​bfs, dfs.​ So different data structures
are used for graph processing and a class of databases called graph databases is used for such purpose.
Another class of databases that store objects. These are accessed by programs written in an imperative
language. Examples include: ObjectStore, and Objectivity.Then there are noSQL databases: examples include
Cassandra, Hbase, BigTable and others.d atabases in this category are often distributed.

Vous aimerez peut-être aussi