Vous êtes sur la page 1sur 10

9/18/2014

Lecture No 04

SQL: Behind the scenes

By: Syed Aun Irtaza

What is data model?

Three components: 1. Structures


rows and columns?
1. Structures nodes and edges?
2. Constraints key-value pairs?
3. Operations
a sequence of bytes?
2. Constraints
all rows must have the same number of columns
all values in one column must have the same type
a child cannot have two parents
3. Operations
find the value of key x
find the rows where column lastname is Jordan
get the next N bytes

1
9/18/2014

A collection of information organized to


afford efficient retrieval When people use the word
database, fundamentally what they
are saying is that the data should
be self-describing and it should
have a schema. Thats really all the
word database means.
-- Jim Gray, The Fourth Paradigm

How is the data physically organized on disk?


What kinds of queries are efficiently
1. Sharing supported by this organization, and what
Support concurrent access by multiple readers and kinds are not?
writers How hard is it update the data, or add new
2. Data Model Enforcement data?
Make sure all applications see clean, organized data
What happens when I encounter new queries
3. Scale that I didnt anticipate? Do I reorganize the
Work with datasets too large to fit in memory
data? How hard is that?
4. Flexibility
Use the data in new, unanticipated ways

One View Everything is a table


Relational Database Management Systems Every row in a table has the same columns
were invented to let you use one set of data Relationships are implicit: no pointers
in multiple ways, including ways that are
unforeseen at the time the database is built
and the 1st applications are written.
(Curt Monash, analyst/blogger)

2
9/18/2014

God made the integers; all else is the work of Pre-Relational: if your data changed, your
application broke.
man. Early RDBMS were buggy and slow (and often
(Leopold Kronecker, 19th Century Mathematician) reviled), but required only 5% of the application
Codd made relations; all else is the work of code.
Activities of users at terminals and most
man. application programs should remain unaffected
(Raghu Ramakrishnan, DB text book author) when the internal representation of data is
changed and even when some aspects of the
external representation are changed. Codd-- 79
Key Ideas: Programs that manipulate tabular data
exhibit an algebraic structure allowing reasoning
and manipulation independently of physical data
representation

N = ((z*2)+((z*3)+0))/1 5 1
Algebraic Laws:
1. (+) identity: x+0 = x
2. (/) identity: x/1 = x 4 3
3. (*) distributes: (n*x+n*y) = n*(x+y) 2
4. (*) commutes: x*y = y*x
Apply rules 1, 3, 4, 2:
N = (2+3)*z
two operations instead of five, no division
operator
Same idea works with the Relational Algebra!

3
9/18/2014

Sets: {a,b,c}, {a,d,e,f}, { }, . . .


Bags: {a, a, b, c}, {b, b, b, b, b}, . . .

Relational Algebra has two semantics:


Set semantics = standard Relational Algebra
Bag semantics = extended Relational Algebra

Rule of thumb:
Every paper will assume set semantics
Every implementation will assume bag semantics

R1 R2 R1 - R2
SELECT * FROM R1 SELECT * FROM R1
UNION All EXCEPT
SELECT * FROM R2 SELECT * FROM R2

Derived operator using minus Returns all tuples which satisfy a condition

R1 R2 = R1 (R1 R2) c(R)

Derived operator using minus Examples


sSalary > 40000 (Employee)
R1 R2 = R1 R2 sname = Smith (Employee)
The condition c can be =, <, , >, , <>

4
9/18/2014

Eliminates columns

A1,,An (R)

Example: project social-security number and


names:
P SSN, Name (Employee)
Answer(SSN, Name)

Each tuple in R1 with each tuple in R2

R1 R2

Traditionally rare in practice, but can come up


in analytics
Find all pairs of similar images/tweets/songs
Compute the cross product, then compute
a similarity function f(x1,x2) for every
possible pair

R1 A=B R2 = A=B (R1 R2)

SELECT * SELECT *
FROM R1 JOIN R2 FROM R1, R2
ON R1.A = R2.B WHERE R1.A = R2.B

Two ways to spell the same query


The optimizer doesnt care about the syntax you
use; its going to work on the algebraic
representation anyway.
Sometimes one syntax or the other is more
convenient.

5
9/18/2014

A join that involves a predicate Find all hospitals within 5 miles of a school

R1 R2 = (R1 R2)
name(Hospitals distance(location,location) < 5 Schools)
Here can be any condition

Note that equi-join is a special case of SELECT DISTINCT h.name


FROM Hospitals h, Schools s
theta join where is an equality condition
WHERE distance(h.location, s.location) < 5

Find all user clicks made within 5 seconds of Outer join


a page load Include tuples with no matches in the output
Use NULL values for missing attributes

Clicks abs(click_time load_time) < 5 PageLoads Variants


Left outer join
Right outer join
Full outer join
SELECT *
FROM Clicks c, PageLoads p
WHERE abs(c.click_time - p.load_time) < 5

You might hear band join or range join

SELECT binid,
round(avg(cast(Fluo as float)),3) as Fluo,
round(avg(cast(Oxygen as float)),3) as Oxygen,
round(avg(cast(Nitrate_uM as float)),3) as Nitrate_uM,
round(avg(cast(longitude as float)),3) as longitude,
round(avg(cast(latitude as float)),3) as latitude
FROM (
SELECT *, cast(floor(ts) +
floor((ts - floor(ts))*24*60/binsize) *
binsize / (24*60) as datetime) as binid
FROM (
SELECT *, cast(timestamp as float) as ts, 5.0 as binsize
FROM Tokyo_4_merged_data_time
)x
) bins
GROUP BY binid
ORDER BY binid asc

6
9/18/2014

7
9/18/2014

PostgreSQL (and Greenplum)


SQL, PL/pgSQL, Python, C/C++, R,
Microsoft SQL Server
SQL, T-SQL, C# or any CLR language
Oracle
SQL, PL-SQL, Java, C/C++, Python, others
SQLite
NONE!! (sorry)

8
9/18/2014

A view is just a query with a name


We can use the view just like a real table

9
9/18/2014

A view is a relation defined by a query

10

Vous aimerez peut-être aussi