Vous êtes sur la page 1sur 65

The Lost Art

of the
Self Join
Beat Vontobel, CTO, MeteoNews AG
b.vontobel@meteonews.ch
Why and what?
• Idea for session dates back to 2005
‣ Sudoku solver in a Stored Procedure (Per-Erik Martin)
‣ „The lost Art of the Join“ (Erik Bergen)
‣ Self Joins in my last year‘s presentation
„The declarative power of VIEWs“
• A few serious, but simpler examples of Self Joins
• One to be taken less seriously, but more complex
From last year: Paradigms
• Imperative Programming
‣ PHP, C, Java…
‣ Specify the Algorithm: How?
• Declarative Programming
‣ Prolog, Lisp, XSLT, SQL…
‣ Specify the Goal: What?
Every Table needs an Alias
SELECT child.child AS child,
Martha
sibling.child AS sibling
FROM parents [AS] child Paul Chris
INNER JOIN parents [AS] sibling
ON child.parent = sibling.parent Julie
WHERE child.child != sibling.child;

parent child
martha paul
chris julie
martha chris
A simple Self Join
SELECT child.child AS child,
Martha
sibling.child AS sibling
FROM parents [AS] child Paul Chris
INNER JOIN parents [AS] sibling
ON child.parent = sibling.parent Julie
WHERE child.child != sibling.child;

+-------+---------+
parent child
| child | sibling | child parent
+-------+---------+ paul martha martha paul
| Paul | Chris | julie chris chris julie
| Chris | Paul | chris martha martha chris
+-------+---------+
2 rows in set (0.00 sec)
Trees in SQL
• Basic Text Book Example: Employees Table
• „Nested Set Model“
‣ Google for „Trees SQL Mike Hillyer“
Restriction on Self Joins: Temporary Tables
mysql> CREATE TEMPORARY TABLE t (i INT);
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT *
FROM t t1
CROSS JOIN t t2;
ERROR 1137 (HY000): Can't reopen table: 't1'

Workaround:
Create global tables with uniqe names
(e.g. using session ID)

mysql> CREATE TABLE t_89372 (i INT);


Example table: Temperatures
temps
station CHAR(3)
PK
dtime TIMESTAMP
temp DECIMAL(3, 1)

mysql1.intern-test [admin] > SELECT * FROM temps;


+---------+---------------------+------+
| station | dtime | temp |
+---------+---------------------+------+
| ABO | 2008-04-04 00:10:00 | -2.0 |
| ABO | 2008-04-04 00:20:00 | -1.9 |
| … | … | … |
| BAS | 2008-04-04 00:10:00 | 6.1 |
| BAS | 2008-04-04 00:20:00 | 6.2 |
| … | … | … |
+---------+---------------------+------+
Absolute to relative
SELECT current.station AS stat,
current.dtime,
previous.temp AS prev,
current.temp AS curr,
current.temp - previous.temp AS diff
FROM temps current
INNER JOIN temps previous
ON current.station = previous.station
AND previous.dtime
= current.dtime - INTERVAL 10 MINUTE
ORDER BY diff DESC
LIMIT 10
Absolute to relative
+------+---------------------+-------+-------+------+
| stat | dtime | prev | curr | diff |
+------+---------------------+-------+-------+------+
| SAM | 2008-04-04 08:20:00 | -4.8 | -1.1 | 3.7 |
| MAG | 2008-04-04 01:10:00 | 7.6 | 10.7 | 3.1 |
| BUF | 2008-04-04 22:10:00 | -13.1 | -10.2 | 2.9 |
| MAG | 2008-04-04 05:00:00 | 7.3 | 10.1 | 2.8 |
| CIM | 2008-04-04 10:00:00 | 1.8 | 4.6 | 2.8 |
| MAG | 2008-04-04 00:20:00 | 6.0 | 8.4 | 2.4 |
| CHZ | 2008-04-04 09:40:00 | 7.8 | 10.2 | 2.4 |
| MAG | 2008-04-04 04:20:00 | 6.3 | 8.7 | 2.4 |
| EGH | 2008-04-04 12:10:00 | -8.5 | -6.2 | 2.3 |
| VIS | 2008-04-04 05:40:00 | -1.8 | 0.5 | 2.3 |
+------+---------------------+-------+-------+------+
10 rows in set (0.21 sec)
Missed opportunity for a Self Join
// Typical example of „keeping state“ over
// a loop fetching rows from the database

while($row = mysql_fetch_row($result)) {
// Computation involving $oldrow and $row

$oldrow = $row;
}
Why not in a loop?
• SQL code is clearer
• Logical dependency between SQL and application level
• But: „I have to loop anyway! Might even be faster in some
cases…“
‣ You have to order by what you need for computation
‣ Different order requested for the result?
‣ You might miss the opportunity to use framework
Use for any kind of serial data…
• Meteorological data
• Racing lap times
• Fuel used
• Bank account figures
• Stock values
• Webserver hit statistics
• Mails processed
• …
Fill the gaps (simple linear interpolation)
SELECT current.dtime,
current.temp AS orig,
COALESCE( current.temp,
ROUND((prev.temp + next.temp) / 2, 1)
) AS interpol
FROM temps current
INNER JOIN temps prev
ON prev.dtime
= current.dtime - INTERVAL 10 MINUTE
AND prev.station = current.station
INNER JOIN temps next
ON next.dtime
= current.dtime + INTERVAL 10 MINUTE
AND next.station = current.station
WHERE current.station LIKE 'TAE'
Fill the gaps (simple linear interpolation)
+---------------------+------+----------+
| dtime | orig | interpol |
+---------------------+------+----------+
| … | … | … |
| 2008-04-04 16:30:00 | 7.9 | 7.9 |
| 2008-04-04 16:40:00 | 8.0 | 8.0 |
| 2008-04-04 16:50:00 | NULL | 7.9 |
| 2008-04-04 17:00:00 | 7.8 | 7.8 |
| … | … | … |
+---------------------+------+----------+
142 rows in set (0.01 sec)
Walking average
SELECT current.dtime,
current.temp,
ROUND( ( 3 * current.temp
+ 2 * prev1.temp
+ 1 * prev2.temp ) / 6, 1
) AS walking_avg
FROM temps current
INNER JOIN temps prev1
ON prev1.dtime
= current.dtime - INTERVAL 10 MINUTE
AND prev1.station = current.station
INNER JOIN temps prev2
ON prev2.dtime
= current.dtime - INTERVAL 20 MINUTE
AND prev2.station = current.station
WHERE current.station LIKE 'TAE'
ORDER BY current.dtime
Walking average
+---------------------+------+-------------+
| dtime | temp | walking_avg |
+---------------------+------+-------------+
| … | … | … |
| 2008-04-04 09:10:00 | 5.7 | 5.7 |
| 2008-04-04 09:20:00 | 5.8 | 5.7 |
| 2008-04-04 09:30:00 | 6.3 | 6.0 |
| 2008-04-04 09:40:00 | 6.3 | 6.2 |
| 2008-04-04 09:50:00 | 6.0 | 6.2 |
| 2008-04-04 10:00:00 | 6.2 | 6.2 |
| 2008-04-04 10:10:00 | 6.6 | 6.4 |
| 2008-04-04 10:20:00 | 6.6 | 6.5 |
| 2008-04-04 10:30:00 | 6.6 | 6.6 |
| … | … | … |
+---------------------+------+-------------+
142 rows in set (0.01 sec)
Coherence/„Correlation“
SELECT source.station, correlated.station,
STDDEV(
source.temp - correlated.temp
) AS dev,
AVG(
source.temp - correlated.temp
) AS offset
FROM temps source
INNER JOIN temps correlated
ON source.dtime = correlated.dtime
WHERE source.station = 'TAE'
GROUP BY source.station, correlated.station
ORDER BY dev
Coherence („Correlation“)
+---------+---------+---------+----------+
| station | station | dev | offset |
+---------+---------+---------+----------+
| TAE | TAE | 0.00000 | 0.00000 |
| TAE | ABO | 0.60563 | 5.43636 |
| TAE | FRE | 0.65031 | 4.14615 |
| … | … | … | … |
| TAE | CHZ | 1.05226 | -1.57063 |
| TAE | CIM | 1.07280 | 3.45035 |
| … | … | … | … |
| TAE | SBO | 3.58539 | -6.88811 |
+---------+---------+---------+----------+
87 rows in set (0.04 sec)
Groupwise maximum row (Subquery)
SELECT a.station,
a.dtime,
a.temp
FROM temps a
WHERE a.temp = ( SELECT MAX(temp)
FROM temps b
WHERE b.station = a.station )
ORDER BY a.station, a.temp;
Groupwise maximum row
+---------+---------------------+-------+
| station | dtime | temp |
+---------+---------------------+-------+
| ABO | 2008-04-04 14:40:00 | 4.7 |
| AIG | 2008-04-04 13:40:00 | 13.0 |
| ALT | 2008-04-04 12:20:00 | 10.9 |
| ALT | 2008-04-04 12:30:00 | 10.9 |
| … | … | … |
| WYN | 2008-04-04 14:30:00 | 10.5 |
| ZER | 2008-04-04 14:10:00 | 5.3 |
+---------+---------------------+-------+
114 rows in set (4.03 sec)
Groupwise maximum row (Self Join)
SELECT maximum.station,
maximum.dtime,
maximum.temp
FROM temps maximum
LEFT JOIN temps higher
ON maximum.station = higher.station
AND maximum.temp < higher.temp
WHERE higher.station IS NULL
AND maximum.temp IS NOT NULL
ORDER BY maximum.station, maximum.temp
Groupwise maximum row (Joined Subquery)
SELECT a.station,
a.dtime,
a.temp
FROM temps a
INNER JOIN ( SELECT station, MAX(temp) AS temp
FROM temps
GROUP BY station ) b
ON (a.station, a.temp) = (b.station, b.temp)
ORDER BY a.station, a.temp;
Groupwise maximum row (Alternatives)
CORRELATED SUBQUERY
| ZER | 2008-04-04 14:10:00 | 5.3 |
+---------+---------------------+-------+
114 rows in set (4.03 sec)

SELF JOIN
| ZER | 2008-04-04 14:10:00 | 5.3 |
+---------+---------------------+-------+
114 rows in set (1.43 sec)

JOINED SUBQUERY
| ZER | 2008-04-04 14:10:00 | 5.3 |
+---------+---------------------+-------+
114 rows in set (0.05 sec)
Comment from a Blog Post

„I left joined a table with itself once, and a lightning fast query
had a second or so delay. Two joins, and it was slow. Three joins,
and it took upwards of 15 seconds. This kind of joining a table to
itself repeatedly kills database performance.“ (John)
A few Words of Caution
• Rows to scan: nm (rowstable_references )
• 15‘000 rows (example table) (0.003s @ 5 Mio. rows/s)
‣ joined once: n2 = 225‘000‘000 rows (45s)
‣ joined twice: n3 = 3‘375‘000‘000‘000 rows (7.8 days)
‣ joined three times: n4 = 50‘625‘000‘000‘000‘000 rows
(321 years)
• Check your JOIN conditions and indexes
• Do some EXPLAINs
‣ „EXPLAIN demystified“
(Baron Schwartz, today, 2:00pm, Ballroom D)
A few Words of Caution: Many Joins
• Time spent executing query
• Before that: Time spent finding execution plan!
Sudoku

5 3 7
6 1 9 5
9 8 6
8 6 3
4 8 3 1
7 2 6
6 2 8
4 1 9 5
8 7 9
Sudoku: Fill every square with all digits 1–9

5 3 7
6 1 9 5
9 8 6
8 6 3
4 8 3 1
7 2 6
6 2 8
4 1 9 5
8 7 9
Sudoku: No digit repeated on column or row

5 3 7
6 1 9 5
9 8 6
8 6 3
4 8 3 1
7 2 6
6 2 8
4 1 9 5
8 7 9
Solve a Sudoku with one Query?
• SQL: One „solution“ equals „one row“
‣ There might be more than one solution
‣ Soduku „spread out“ horizontally (one column per field)
• Table `digits` holding the „base material“: 1, 2, 3, 4, 5…
‣ Self Joins: One table reference for every field
‣ 9-by-9: 81 table references („80 Joins“)
‣ MySQL Limit: 61 Joins (31 back in MySQL 3.23)
How to Solve a Sudoku „Brute Force“

6
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

1 6
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

1 6
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

2 6
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

2 1 6
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

2 1 6 1
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

2 1 6 1
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

2 1 6 2
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

2 1 6 ?
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“

2 1 6 5
6 3
4 1
1 3 6
5 4 6
1
How to Solve a Sudoku „Brute Force“
• Try all 6 digits for a field
‣ Still no solution?
‣ Backtrack!
Erase field
Try something different in the previous field
Sometimes this means „back to square one“
• So, a long, long time later…
How to Solve a Sudoku „Brute Force“

5 3 6 2 1 4
2 4 1 6 5 3
4 6 3 1 2 5
1 5 2 3 4 6
3 1 5 4 6 2
6 2 4 5 1
How to Solve a Sudoku „Brute Force“

5 3 6 2 1 4
2 4 1 6 5 3
4 6 3 1 2 5
1 5 2 3 4 6
3 1 5 4 6 2
6 2 4 5 1 1
How to Solve a Sudoku „Brute Force“

5 3 6 2 1 4
2 4 1 6 5 3
4 6 3 1 2 5
1 5 2 3 4 6
3 1 5 4 6 2
6 2 4 5 2 1
How to Solve a Sudoku „Brute Force“

5 3 6 2 1 4
2 4 1 6 5 3
4 6 3 1 2 5
1 5 2 3 4 6
3 1 5 4 6 2
6 2 4 5 3 1
How to Solve a Sudoku „Brute Force“
• We‘re not finished yet!
‣ There might be another solution…
‣ So, backtrack and try other possibilities…
Solving a Sudoku with one SELECT (1)
SELECT CONCAT(
d11.d, ' ', d12.d, ' ', d13.d, ' ', d14.d, ' ', d15.d, ' ', d16.d, ' ', CHAR(10),
d21.d, ' ', d22.d, ' ', d23.d, ' ', d24.d, ' ', d25.d, ' ', d26.d, ' ', CHAR(10),
d31.d, ' ', d32.d, ' ', d33.d, ' ', d34.d, ' ', d35.d, ' ', d36.d, ' ', CHAR(10),
d41.d, ' ', d42.d, ' ', d43.d, ' ', d44.d, ' ', d45.d, ' ', d46.d, ' ', CHAR(10),
d51.d, ' ', d52.d, ' ', d53.d, ' ', d54.d, ' ', d55.d, ' ', d56.d, ' ', CHAR(10),
d61.d, ' ', d62.d, ' ', d63.d, ' ', d64.d, ' ', d65.d, ' ', d66.d, ' ', CHAR(10)
) AS solution
FROM digits d11
INNER JOIN digits d12
ON COALESCE(d12.d = ( SELECT d FROM start WHERE i = 1 AND j = 2 ), 1)
AND d12.d != d11.d
INNER JOIN digits d13
ON COALESCE(d13.d = ( SELECT d FROM start WHERE i = 1 AND j = 3 ), 1)
AND d13.d != d11.d AND d13.d != d12.d
INNER JOIN digits d14
ON COALESCE(d14.d = ( SELECT d FROM start WHERE i = 1 AND j = 4 ), 1)
AND d14.d != d11.d AND d14.d != d12.d AND d14.d != d13.d
INNER JOIN digits d15
ON COALESCE(d15.d = ( SELECT d FROM start WHERE i = 1 AND j = 5 ), 1)
AND d15.d != d11.d AND d15.d != d12.d AND d15.d != d13.d AND d15.d != d14.d
INNER JOIN digits d16
ON COALESCE(d16.d = ( SELECT d FROM start WHERE i = 1 AND j = 6 ), 1)
AND d16.d != d11.d AND d16.d != d12.d AND d16.d != d13.d AND d16.d != d14.d AND d16.d != d15.d
INNER JOIN digits d21
ON COALESCE(d21.d = ( SELECT d FROM start WHERE i = 2 AND j = 1 ), 1)
AND d21.d != d11.d
INNER JOIN digits d22
ON COALESCE(d22.d = ( SELECT d FROM start WHERE i = 2 AND j = 2 ), 1)
AND d22.d != d21.d
AND d22.d != d12.d
AND d22.d != d11.d
INNER JOIN digits d23
ON COALESCE(d23.d = ( SELECT d FROM start WHERE i = 2 AND j = 3 ), 1)
AND d23.d != d21.d AND d23.d != d22.d
AND d23.d != d13.d
AND d23.d != d11.d AND d23.d != d12.d
Solving a Sudoku with one SELECT (2)
INNER JOIN digits d24
ON COALESCE(d24.d = ( SELECT d FROM start WHERE i = 2 AND j = 4 ), 1)
AND d24.d != d21.d AND d24.d != d22.d AND d24.d != d23.d
AND d24.d != d14.d
INNER JOIN digits d25
ON COALESCE(d25.d = ( SELECT d FROM start WHERE i = 2 AND j = 5 ), 1)
AND d25.d != d21.d AND d25.d != d22.d AND d25.d != d23.d AND d25.d != d24.d
AND d25.d != d15.d
AND d25.d != d14.d
INNER JOIN digits d26
ON COALESCE(d26.d = ( SELECT d FROM start WHERE i = 2 AND j = 6 ), 1)
AND d26.d != d21.d AND d26.d != d22.d AND d26.d != d23.d AND d26.d != d24.d AND d26.d != d25.d
AND d26.d != d16.d
AND d26.d != d14.d AND d26.d != d15.d
INNER JOIN digits d31
ON COALESCE(d31.d = ( SELECT d FROM start WHERE i = 3 AND j = 1 ), 1)
AND d31.d != d11.d AND d31.d != d21.d
INNER JOIN digits d32
ON COALESCE(d32.d = ( SELECT d FROM start WHERE i = 3 AND j = 2 ), 1)
AND d32.d != d31.d
AND d32.d != d12.d AND d32.d != d22.d
INNER JOIN digits d33
ON COALESCE(d33.d = ( SELECT d FROM start WHERE i = 3 AND j = 3 ), 1)
AND d33.d != d31.d AND d33.d != d32.d
AND d33.d != d13.d AND d33.d != d23.d
INNER JOIN digits d34
ON COALESCE(d34.d = ( SELECT d FROM start WHERE i = 3 AND j = 4 ), 1)
AND d34.d != d31.d AND d34.d != d32.d AND d34.d != d33.d
AND d34.d != d14.d AND d34.d != d24.d
INNER JOIN digits d35
ON COALESCE(d35.d = ( SELECT d FROM start WHERE i = 3 AND j = 5 ), 1)
AND d35.d != d31.d AND d35.d != d32.d AND d35.d != d33.d AND d35.d != d34.d
AND d35.d != d15.d AND d35.d != d25.d
INNER JOIN digits d36
ON COALESCE(d36.d = ( SELECT d FROM start WHERE i = 3 AND j = 6 ), 1)
AND d36.d != d31.d AND d36.d != d32.d AND d36.d != d33.d AND d36.d != d34.d AND d36.d != d35.d
AND d36.d != d16.d AND d36.d != d26.d
Solving a Sudoku with one SELECT (3)
INNER JOIN digits d41
ON COALESCE(d41.d = ( SELECT d FROM start WHERE i = 4 AND j = 1 ), 1)
AND d41.d != d11.d AND d41.d != d21.d AND d41.d != d31.d
INNER JOIN digits d42
ON COALESCE(d42.d = ( SELECT d FROM start WHERE i = 4 AND j = 2 ), 1)
AND d42.d != d41.d
AND d42.d != d12.d AND d42.d != d22.d AND d42.d != d32.d
AND d42.d != d31.d
INNER JOIN digits d43
ON COALESCE(d43.d = ( SELECT d FROM start WHERE i = 4 AND j = 3 ), 1)
AND d43.d != d41.d AND d43.d != d42.d
AND d43.d != d13.d AND d43.d != d23.d AND d43.d != d33.d
AND d43.d != d31.d AND d43.d != d32.d
INNER JOIN digits d44
ON COALESCE(d44.d = ( SELECT d FROM start WHERE i = 4 AND j = 4 ), 1)
AND d44.d != d41.d AND d44.d != d42.d AND d44.d != d43.d
AND d44.d != d14.d AND d44.d != d24.d AND d44.d != d34.d
INNER JOIN digits d45
ON COALESCE(d45.d = ( SELECT d FROM start WHERE i = 4 AND j = 5 ), 1)
AND d45.d != d41.d AND d45.d != d42.d AND d45.d != d43.d AND d45.d != d44.d
AND d45.d != d15.d AND d45.d != d25.d AND d45.d != d35.d
AND d45.d != d34.d
INNER JOIN digits d46
ON COALESCE(d46.d = ( SELECT d FROM start WHERE i = 4 AND j = 6 ), 1)
AND d46.d != d41.d AND d46.d != d42.d AND d46.d != d43.d AND d46.d != d44.d AND d46.d != d45.d
AND d46.d != d16.d AND d46.d != d26.d AND d46.d != d36.d
AND d46.d != d34.d AND d46.d != d35.d
INNER JOIN digits d51
ON COALESCE(d51.d = ( SELECT d FROM start WHERE i = 5 AND j = 1 ), 1)
AND d51.d != d11.d AND d51.d != d21.d AND d51.d != d31.d AND d51.d != d41.d
INNER JOIN digits d52
ON COALESCE(d52.d = ( SELECT d FROM start WHERE i = 5 AND j = 2 ), 1)
AND d52.d != d51.d
AND d52.d != d12.d AND d52.d != d22.d AND d52.d != d32.d AND d52.d != d42.d
Solving a Sudoku with one SELECT (4)
INNER JOIN digits d53
ON COALESCE(d53.d = ( SELECT d FROM start WHERE i = 5 AND j = 3 ), 1)
AND d53.d != d51.d AND d53.d != d52.d
AND d53.d != d13.d AND d53.d != d23.d AND d53.d != d33.d AND d53.d != d43.d
INNER JOIN digits d54
ON COALESCE(d54.d = ( SELECT d FROM start WHERE i = 5 AND j = 4 ), 1)
AND d54.d != d51.d AND d54.d != d52.d AND d54.d != d53.d
AND d54.d != d14.d AND d54.d != d24.d AND d54.d != d34.d AND d54.d != d44.d
INNER JOIN digits d55
ON COALESCE(d55.d = ( SELECT d FROM start WHERE i = 5 AND j = 5 ), 1)
AND d55.d != d51.d AND d55.d != d52.d AND d55.d != d53.d AND d55.d != d54.d
AND d55.d != d15.d AND d55.d != d25.d AND d55.d != d35.d AND d55.d != d45.d
INNER JOIN digits d56
ON COALESCE(d56.d = ( SELECT d FROM start WHERE i = 5 AND j = 6 ), 1)
AND d56.d != d51.d AND d56.d != d52.d AND d56.d != d53.d AND d56.d != d54.d AND d56.d != d55.d
AND d56.d != d16.d AND d56.d != d26.d AND d56.d != d36.d AND d56.d != d46.d
INNER JOIN digits d61
ON COALESCE(d61.d = ( SELECT d FROM start WHERE i = 6 AND j = 1 ), 1)
AND d61.d != d11.d AND d61.d != d21.d AND d61.d != d31.d AND d61.d != d41.d AND d61.d != d51.d
INNER JOIN digits d62
ON COALESCE(d62.d = ( SELECT d FROM start WHERE i = 6 AND j = 2 ), 1)
AND d62.d != d61.d
AND d62.d != d12.d AND d62.d != d22.d AND d62.d != d32.d AND d62.d != d42.d AND d62.d != d52.d
AND d62.d != d51.d
INNER JOIN digits d63
ON COALESCE(d63.d = ( SELECT d FROM start WHERE i = 6 AND j = 3 ), 1)
AND d63.d != d61.d AND d63.d != d62.d
AND d63.d != d13.d AND d63.d != d23.d AND d63.d != d33.d AND d63.d != d43.d AND d63.d != d53.d
AND d63.d != d51.d AND d63.d != d52.d
INNER JOIN digits d64
ON COALESCE(d64.d = ( SELECT d FROM start WHERE i = 6 AND j = 4 ), 1)
AND d64.d != d61.d AND d64.d != d62.d AND d64.d != d63.d
AND d64.d != d14.d AND d64.d != d24.d AND d64.d != d34.d AND d64.d != d44.d AND d64.d != d54.d
Solving a Sudoku with one SELECT (5)
INNER JOIN digits d65
ON COALESCE(d65.d = ( SELECT d FROM start WHERE i = 6 AND j = 5 ), 1)
AND d65.d != d61.d AND d65.d != d62.d AND d65.d != d63.d AND d65.d != d64.d
AND d65.d != d15.d AND d65.d != d25.d AND d65.d != d35.d AND d65.d != d45.d AND d65.d != d55.d
AND d65.d != d54.d
INNER JOIN digits d66
ON COALESCE(d66.d = ( SELECT d FROM start WHERE i = 6 AND j = 6 ), 1)
AND d66.d != d61.d AND d66.d != d62.d AND d66.d != d63.d AND d66.d != d64.d AND d66.d != d65.d
AND d66.d != d16.d AND d66.d != d26.d AND d66.d != d36.d AND d66.d != d46.d AND d66.d != d56.d
AND d66.d != d54.d AND d66.d != d55.d
WHERE COALESCE(d11.d = ( SELECT d FROM start WHERE i = 1 AND j = 1 ), 1)
Table `digits` for the „pool“ of digits
+---+
| d |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+---+
Table `start` for initial conditions
+---+---+------+
| i | j | d |
+---+---+------+
6
| 1 | 3 | 6 |
| 2 | 4 | 6 | 6 3
| 2 | 6 | 3 |
| 3 | 1 | 4 |
| 3 | 4 | 1 | 4 1
| 4 | 1 | 1 |
| 4 | 4 | 3 | 1 3 6
| 4 | 6 | 6 |
| 5 | 3 | 5 |
| 5 | 4 | 4 | 5 4 6
| 5 | 5 | 6 |
| 6 | 6 | 1 |
+---+---+------+
1
How the query works: First field

FROM digits d11
INNER JOIN digits d12
ON COALESCE(
d12.d = ( SELECT d FROM start
WHERE i = 1 AND j = 2 ),
1
)
AND d12.d != d11.d
INNER JOIN digits d13
ON COALESCE(
d13.d = ( SELECT d FROM start
WHERE i = 1 AND j = 3 ),
1
)
AND d13.d != d11.d AND d13.d != d12.d

How the query works: Second field

FROM digits d11
INNER JOIN digits d12
ON COALESCE(
d12.d = ( SELECT d FROM start
WHERE i = 1 AND j = 2 ),
1
)
AND d12.d != d11.d
INNER JOIN digits d13
ON COALESCE(
d13.d = ( SELECT d FROM start
WHERE i = 1 AND j = 3 ),
1
)
AND d13.d != d11.d AND d13.d != d12.d

How the query works: Third field

FROM digits d11
INNER JOIN digits d12
ON COALESCE(
d12.d = ( SELECT d FROM start
WHERE i = 1 AND j = 2 ),
1
)
AND d12.d != d11.d
INNER JOIN digits d13
ON COALESCE(
d13.d = ( SELECT d FROM start
WHERE i = 1 AND j = 3 ),
1
)
AND d13.d != d11.d AND d13.d != d12.d

How the query works: Last field


INNER JOIN digits d66
ON COALESCE( … )
AND d66.d != d61.d AND d66.d != d62.d AND
d66.d != d63.d AND d66.d != d64.d AND
d66.d != d65.d
AND d66.d != d16.d AND d66.d != d26.d AND
d66.d != d36.d AND d66.d != d46.d AND
d66.d != d56.d
AND d66.d != d54.d AND d66.d != d55.d

How the query works: Last field


INNER JOIN digits d66
ON COALESCE( … )
AND d66.d != d61.d AND d66.d != d62.d AND
d66.d != d63.d AND d66.d != d64.d AND
d66.d != d65.d
AND d66.d != d16.d AND d66.d != d26.d AND
d66.d != d36.d AND d66.d != d46.d AND
d66.d != d56.d
AND d66.d != d54.d AND d66.d != d55.d

How the query works: Last field


INNER JOIN digits d66
ON COALESCE( … )
AND d66.d != d61.d AND d66.d != d62.d AND
d66.d != d63.d AND d66.d != d64.d AND
d66.d != d65.d
AND d66.d != d16.d AND d66.d != d26.d AND
d66.d != d36.d AND d66.d != d46.d AND
d66.d != d56.d
AND d66.d != d54.d AND d66.d != d55.d

How the query works: Last field


INNER JOIN digits d66
ON COALESCE( … )
AND d66.d != d61.d AND d66.d != d62.d AND
d66.d != d63.d AND d66.d != d64.d AND
d66.d != d65.d
AND d66.d != d16.d AND d66.d != d26.d AND
d66.d != d36.d AND d66.d != d46.d AND
d66.d != d56.d
AND d66.d != d54.d AND d66.d != d55.d

Conclusions from the „Sudoku-Case“
• Declarative Paradigm (Constraint Programming)
‣ Don‘t care about the „how“, but about the „what“
‣ Optimizer does a great job!
• (Ab-)use built-in Backtracking of Join Engine
• A query might look awkward – but still performs!
Some reasons for reasonable performance…
• Very small table (`digits`) and covering index
• Small result set: Always working on one row!
• Subqueries basically optimized away
‣ „Impossible WHERE noticed“ (no pre-condition case)
‣ Constant (pre-condition case)
• Optimizer/Join Engine is good at this stuff!
+----+-------------+-------+-------+---------------+---------+---------+-------------+------+-------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------------+------+-------------------------------+
| 1 | PRIMARY | d11 | index | NULL | PRIMARY | 1 | NULL | 6 | Using where; Using index |
| 1 | PRIMARY | d12 | index | NULL | PRIMARY | 1 | NULL | 6 | Using where; Using index |
| … | … | … | … | … | … | … | … | … | … |
| 37 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after|
| | | | | | | | | | reading const tables |
| 36 | SUBQUERY | start | const | PRIMARY | PRIMARY | 2 | const,const | 1 | |
| … | … | … | … | … | … | … | … | … | … |
+----+-------------+-------+-------+---------------+---------+---------+-------------+------+-------------------------------+
72 rows in set (0.01 sec)
Final Message
• Have fun with the declarative power of SQL!
‣ Despite its flaws…
• Do it the SQL way!
• Slides and code will be made available on conference
website
• Check out Developer Zone on MySQL website for an
upcoming article version of my last year‘s session „The
declarative power of VIEWs“
This work is licensed under the Creative Commons Attribution-
Noncommercial-Share Alike 3.0 Unported License.

To view a copy of this license, visit


http://creativecommons.org/licenses/by-nc-sa/3.0/
or send a letter to
Creative Commons, 171 Second Street, Suite 300,
San Francisco, California, 94105, USA.

Vous aimerez peut-être aussi