Vous êtes sur la page 1sur 45

A Presentation

Query Tuning

Plaza Semanggi 9 Fl, Unit 9 http://www.equnix.asia


Jl. Jend Sudirman Kav 50, Jakarta - 12930
+6221-22866662 | info@equnix.asia
INDONESIA
Table of Contents

Topic
1. Slow Query
2. Indexing
3. SQL Tuning
4. Explain
Query Tuning - Why

Why we need Query tuning?


SQL Statements are used to retrieve data from the database. We can
get same results by writing different sql queries. But use of the best
query is important when performance is considered. So you need to sql
query tuning based on the requirement.

There are Sequence Scan vs Index Scan


Query Tuning - Finding The Problem

First, Find out Slow Queries


❖ From Application (you don’t say)
❖ pg_stat_activity (idle or locking)
❖ pg_log (normal)
➢ Capture slow query by editing log_min_duration parameter at postgresql.conf
❖ pg_stat_statements (yey)
➢ pg_stat_statements is PostgreSQL extension
➢ Located in ‘{PGSRC}/contrib’
Query Tuning - pg_stat_activity

How to Trace ACTIVE and IDLE Query?


❖ ACTIVE
postgres=#select now()-query_start as waiting_duration,
pid,client_addr,query from pg_stat_activity where state =
'active' order by 1 desc;

❖ IDLE
postgres=#select now()-query_start as waiting_duration,
pid,client_addr,query from pg_stat_activity where state =
'idle' order by 1 desc;
Query Tuning - PG_LOG

Ideal PostgreSQL Configuration For pg_log

1. log_destination = 'stderr'
2. logging_collector = on #change require restart
3. log_directory = 'pg_log'
4. log_file_name = 'postgresql-%Y-%m-%d_%H%M%S.log'
5. log_rotation_age = 1d
6. log_rotation_size = 10MB
7. log_min_error_statement = error
8. log_min_duration_statement = 5000 #milliseconds; 0 = all; -1 = disable
9. log_line_prefix = '|%m|%r|%a|%d|%u|%e|'
10. log_statment = 'mod'
Query Tuning - pg_stat_statements

How to Configure pg_stat_statements?

❖ Compile pg_stat_statments at ‘{PGSRC}/contrib’


❖ Edit postgresql.conf located in ‘{PGDATA}’
➢ shared_preload_libraries = 'pg_stat_statements'
➢ pg_stat_statements.max = 10000
■ maximum pg_stat_statments view (row)
➢ pg_stat_statements.track = all
■ top = track top-level statements (those issued directly by clients)
■ all
■ none
➢ pg_stat_statements.save = on
■ off = statistics are not saved at shutdown nor reloaded at server start
❖ Restart Server!
Query Tuning - pg_stat_statements

How to Install pg_stat_statements?

❖ postgres=# CREATE EXTENSION pg_stat_statements;


❖ postgres=# SELECT pg_stat_statements_reset();
➢ Reset statistic # Reset all rows in pg_stat_statements
❖ postgres=# SELECT * FROM pg_stat_statements;
Query Tuning - pg_stat_statements

pg_stat_statements Tips and Tricks?


Get Statistic Query
postgres=#
SELECT
(total_time) || ' ms' as total_time,
(total_time/calls) || ' ms' as average_time,
substring(query,1,30)
FROM pg_stat_statements
WHERE query not like '%pg_catalog%'
ORDER BY total_time/calls DESC
LIMIT 100;
Query Tuning - pg_stat_statements

pg_stat_statements Tips and Tricks?


Get Statistic Calls
postgres=#
SELECT substring(query,1,25), calls, total_time ||
' ms' as total_time, rows, 100.0 * shared_blks_hit
/nullif(shared_blks_hit + shared_blks_read, 0) AS
hit_percent
FROM pg_stat_statements WHERE QUERY NOT
LIKE '%pg_catalog%' ORDER BY calls DESC LIMIT 100;
INDEX

How to use Index ?


❖ Creates INDEX with represent sequences fields on the WHERE Clause
(composite,function,etc)
❖ Get more deep understanding about Index by EXPLAIN
ANALYZE/ANALYSE
❖ Use Index Scan instead of Sequence Scan
❖ Consider to use GiST or GIN indexes when dealing with special data
type and operator
❖ postgres=# CREATE INDEX i_test_idx ON test USING btree (name);
INDEX

PostgreSQL Support Indexes:

1. B-Tree (the default one)


2. Hash (deprecated)
3. GiST
4. SP-GiST
5. GIN (inverted Index)
6. BRIN (Range Index)
INDEX - BTREE

B-Tree Index
❖ Balance Tree though
❖ Default Index in PostgreSQL
❖ Unless we create index using “using” other index type
❖ B-trees can handle equality and range queries on data that can be
sorted into some ordering (prefer integer)
❖ Query planner will consider using a B-tree index whenever an indexed
column is involved in a comparison using one of these operators:
➢ <
<=
=
>=
>
INDEX - HASH

HASH Index
❖ Create Index using “using” keyword
❖ Hash indexes can only handle simple equality (=) comparisons
❖ Hash index operations are not presently WAL-logged
❖ Hash indexes might need to be rebuilt with REINDEX after a database
crash if there were unwritten changes
❖ That’s why HASH index deprecated because btree is more powerful
INDEX - GIST

GIST Index
❖ Create Index using “using” keyword
❖ Generalized Search Tree
❖ For Text Search (Array Type)
❖ A GiST index is lossy, meaning that the index may produce false
matches
❖ Although select is lossy, but GIST is Faster than GIN when UPDATE
❖ Do not use in RDBMS (transaction)
❖ Good for document database with json/jsonb data type
INDEX - SP-GIST

SP-GIST Index
❖ Create Index using “using” keyword
❖ like GIST Index
❖ SP-GiST permits implementation of a wide range of different
non-balanced disk-based data structures, such as quadtrees, k-d trees,
and radix trees (tries)
INDEX - GIN
GIN Index
❖ Create Index using “using” keyword
❖ Generalized Inverted Index
❖ For text search (Array Type)
❖ GIN indexes are not lossy for standard queries
❖ Still do not use in RDBMS (unless standard queries text search)
❖ GIN index lookups are about three times faster than GiST
❖ GIN indexes take about three times longer to build than GiST
❖ GIN indexes are moderately slower to update than GiST indexes, but
about 10 times slower if fast-update support was disabled
❖ GIN indexes are two-to-three times larger than GiST indexes
❖ Good for select JSON/JSONB data type
INDEX - BRIN

BRIN Index
❖ Create Index using “using” keyword
❖ Block Range Indexes
❖ BRIN indexes can speed things up a lot than B-Tree Index
❖ Smaller than B-Tree Index
❖ Decrease up to 25% in disk usage
❖ But only if your data has a strong natural ordering to begin with (date)
INDEX - MISSING INDEX

How to Find Missing Index?


dbname=# SELECT relname, seq_scan-idx_scan AS too_much_seq,
CASE WHEN seq_scan-idx_scan>0 THEN 'Missing Index?' ELSE 'OK'
END, seq_scan, idx_scan FROM pg_stat_all_tables WHERE
schemaname='public' ORDER BY too_much_seq DESC;

relname |too_much_seq| case | seq_scan| idx_scan


-----------------+------------+----------------+---------+----------
pgbench_teller | | OK | 0 |
pgbench_branche | | OK | 0 |
pgbench_account | | OK | 1 |
pgbench_history | | OK | 0 |
test | 4 | Missing Index? | 7 | 3
(5 rows)
INDEX - UNUSED INDEX

How to Find Unused Index?


dbname=# SELECT indexrelid::regclass as index,
relid::regclass as table, 'DROP INDEX ' ||
indexrelid::regclass || ';' as drop_statement FROM
pg_stat_user_indexes JOIN pg_index USING (indexrelid) WHERE
idx_scan = 0 AND indisunique is false;

Index | table | drop_statement


------------+-------+------------------------
i_test_idx | test2 | DROP INDEX i_test_idx;
(1 row)
INDEX

Index Tips and Trick?


❖ Aggregate function can’t be index (parallel worker to the rescue)
❖ Avoid using ‘or‘ in WHERE clause (2 times bitmap scan)
❖ Avoid using ‘LIKE’ with ‘%’ first; ex: ‘%TEST%’; prefer: ‘TEST%’;
❖ SQL query should use ‘and’ or ‘union all’ (index only scan)
❖ Use Index Properly (B-Tree,GIN,GIST,BRIN)
❖ Use INT as long as possible as the KEY
❖ Define behavior of table: Master (reference); Transaction;
SQL Tuning - SELECT

SQL Tips and Trick? - SELECT


The sql query becomes faster if you use the actual columns names in
SELECT statement instead of than '*'

For Example:
Write the query as
SELECT id, first_name, last_name, age, subject FROM student_details;
Instead of:
SELECT * FROM student_details;
SQL Tuning - HAVING
SQL Tips and Trick? - HAVING
HAVING clause is used to filter the rows after all the rows are selected. It is
just like a filter. Do not use HAVING clause for any other purposes.
For Example: Write the query as
For Example:

Write the query as


SELECT subject, count(subject)
FROM student_details
WHERE subject != 'Science'
AND subject != 'Maths'
GROUP BY subject;

Instead of:
SELECT subject, count(subject)
FROM student_details
GROUP BY subject
HAVING subject!= 'Vancouver' AND subject!= 'Toronto';
SQL Tuning - SUBQUERY
SQL Tips and Trick? - SUBQUERY
Sometimes you may have more than one subqueries in your main query.
Try to minimize the number of subquery block in your query.

For Example:
Write the query as
SELECT name
FROM employee
WHERE (salary, age ) = (SELECT MAX (salary), MAX (age)
FROM employee_details)
AND dept = 'Electronics';

Instead of:
SELECT name
FROM employee
WHERE salary = (SELECT MAX(salary) FROM employee_details)
AND age = (SELECT MAX(age) FROM employee_details)
AND emp_dept = 'Electronics';
SQL Tuning - IN and EXISTS

SQL Tips and Trick? - IN and EXISTS


❖ Usually IN has the slowest performance
❖ IN is efficient when most of the filter criteria is in the sub-query
❖ EXISTS is efficient when most of the filter criteria is in the main query

Example IN
SELECT * FROM Orders WHERE ProductNumber IN (1, 10, 100);

Example EXISTS
SELECT * FROM Orders WHERE EXISTS ( SELECT * FROM Orders WHERE
ProductNumber > 10);
SQL Tuning - IN

Example IN
postgres=# explain analyze select * from test where i in (select i from test where i > 999);

QUERY PLAN
----------------------------------------------------------------------------------
Merge Semi Join (cost=0.87..782829.40 rows=9999031 width=72) (actual
time=0.342..7745.913 rows=9999001 loops=1)
Merge Cond: (test.i = test_1.i)
-> Index Scan using test_pkey on test (cost=0.43..303935.09 rows=9999977
width=72) (actual time=0.013..2030.917 ro
ws=10000000 loops=1)
-> Index Only Scan using test_pkey on test test_1 (cost=0.43..328906.48
rows=9999031 width=4) (actual time=0.025..
1924.827 rows=9999001 loops=1)
Index Cond: (i > 999)
Heap Fetches: 9999001
Planning time: 14.564 ms
Execution time: 8113.224 ms
(8 rows)
SQL Tuning - EXISTS

Example EXISTS
postgres=# explain analyze select * from test where exists (select * from test where i > 999);

QUERY PLAN
----------------------------------------------------------------------------------
Result (cost=0.02..144247.79 rows=9999977 width=72) (actual time=0.021..1947.417
rows=10000000 loops=1)
One-Time Filter: $0
InitPlan 1 (returns $0)
-> Seq Scan on test test_1 (cost=0.00..169247.71 rows=9999031 width=0)
(actual time=0.012..0.012 rows=1 loops=1)
Filter: (i > 999)
-> Seq Scan on test (cost=0.00..144247.77 rows=9999977 width=72) (actual
time=0.007..893.664 rows=10000000 loops=1
)
Planning time: 0.127 ms
Execution time: 2306.229 ms
(8 rows)
SQL Tuning - UNION and UNION ALL

SQL Tips and Trick? - UNION and UNION ALL


❖ Try to use UNION ALL in place of UNION
❖ UNION removes duplicate records (where all columns in the results are
the same)
❖ UNION ALL does not (faster).
❖ USE UNION/UNION ALL instead of ‘or’
SQL Tuning - UNION

Example UNION
SELECT /*+ FIRST_ROWS */ PI_FORMAT.F_CAR(A.CAR) AS CAR, A.KD_KANTOR,
A.IMPNAMA, B.PPJKNAMA, F_UR_TASK(A.STATUS),
TO_CHAR(A.WK_STATUS,'DD/MM/YYYY HH24:MI:SS') AS WK_STATUS ,
COALESCE(A.JMLARTAS,0), A.SEQ, A.ID_TRADER
FROM TBLTRACKINGHDR A INNER JOIN TBLPIBHDR B ON
A.CUSDECID=B.CUSDECID
LEFT JOIN TBLORGANIZATION d ON A.IMPNPWP = D.NOIDORG
WHERE (A.ID_TRADER = '' OR D.ORGID = '13054') AND A.TAHUN
IN('2017') AND (date_trunc('day', A.WK_STATUS) BETWEEN
TO_DATE('01/01/2017','DD/MM/YYYY') AND
TO_DATE('05/01/2017','DD/MM/YYYY')) ORDER BY A.WK_STATUS DESC

COST: 50 - 230 detik


SQL Tuning - UNION
Example UNION
SELECT * FROM (SELECT a.*, row_number() over () as rnum FROM (
SELECT /*+ FIRST_ROWS */ PI_FORMAT.F_CAR(A.CAR) AS CAR, A.KD_KANTOR, A.IMPNAMA, B.PPJKNAMA,
F_UR_TASK(A.STATUS),
TO_CHAR(A.WK_STATUS,'DD/MM/YYYY HH24:MI:SS') AS WK_STATUS , COALESCE(A.JMLARTAS,0), A.SEQ,
A.ID_TRADER
FROM TBLTRACKINGHDR A INNER JOIN TBLPIBHDR B ON A.CUSDECID=B.CUSDECID
LEFT JOIN TBLORGANIZATION d ON A.IMPNPWP = D.NOIDORG
WHERE (A.ID_TRADER = '') AND A.TAHUN IN('2017') AND (date_trunc('day', A.WK_STATUS) BETWEEN
TO_DATE('01/01/2017','DD/MM/YYYY') AND TO_DATE('05/01/2017','DD/MM/YYYY'))
UNION
SELECT /*+ FIRST_ROWS */ PI_FORMAT.F_CAR(A.CAR) AS CAR, A.KD_KANTOR, A.IMPNAMA, B.PPJKNAMA,
F_UR_TASK(A.STATUS),
TO_CHAR(A.WK_STATUS,'DD/MM/YYYY HH24:MI:SS') AS WK_STATUS , COALESCE(A.JMLARTAS,0), A.SEQ,
A.ID_TRADER
FROM TBLTRACKINGHDR A INNER JOIN TBLPIBHDR B ON A.CUSDECID=B.CUSDECID
LEFT JOIN TBLORGANIZATION d ON A.IMPNPWP = D.NOIDORG
WHERE (D.ORGID = '13054') AND A.TAHUN IN('2017') AND (date_trunc('day', A.WK_STATUS) BETWEEN
TO_DATE('01/01/2017','DD/MM/YYYY') AND TO_DATE('05/01/2017','DD/MM/YYYY'))
) a ) b ORDER BY WK_STATUS DESC

COST: 15 - 20 detik
SQL Tuning - WHERE

SQL Tips and Trick? - WHERE


For Example:
Write the query as
SELECT id, first_name, age FROM student_details WHERE age > 10;
Instead of:
SELECT id, first_name, age FROM student_details WHERE age != 10;

Write the query as


SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE 'Chan%';
Instead of:
SELECT id, first_name, age
FROM student_details
WHERE SUBSTR(first_name,1,3) = 'Cha';
SQL Tuning - WHERE
SQL Tips and Trick? - WHERE
For Example:
Write the query as
SELECT product_id, product_name
FROM product
WHERE unit_price BETWEEN MAX(unit_price) and MIN(unit_price)
Instead of:
SELECT product_id, product_name
FROM product
WHERE unit_price >= MAX(unit_price)
and unit_price <= MIN(unit_price)

Write the query as


SELECT id, name, salary
FROM employee
WHERE dept = 'Electronics'
AND location = 'Bangalore';
Instead of:
SELECT id, name, salary
FROM employee
WHERE dept || location= 'ElectronicsBangalore';
SQL Tuning - WHERE
SQL Tips and Trick? - WHERE
Use non-column expression on one side of the query because it will be processed earlier
For Example:
Write the query as
SELECT id, name, salary
FROM employee
WHERE salary < 25000;
Instead of:
SELECT id, name, salary
FROM employee
WHERE salary + 10000 < 35000;

Write the query as


SELECT id, first_name, age
FROM student_details
WHERE age > 10;
Instead of:
SELECT id, first_name, age
FROM student_details
WHERE age NOT = 10;
SQL Tuning - MISC

SQL Tips and Trick? - Misc


To write queries which provide efficient performance follow the general
SQL standard rules

❖ Use uppercase for all SQL verbs


❖ Begin all SQL verbs on a new line
❖ Separate all words with a single space
❖ Right or left aligning verbs within the initial SQL verb

Consider CREATE Materialized view for Monitoring, refresh periodically


rather than use the realtime
SQL Tuning - INDEXES

Summary INDEXES
❖ Eliminate Sequential Scans (Seq Scan) by adding indexes (unless table
size is small)
❖ If using a multicolumn index, make sure you pay attention to order in
which you define the included columns
❖ Try to use indexes that are highly selective on commonly-used data.
This will make their use more efficient
SQL Tuning - WHERE

Summary WHERE Clause


❖ Avoid LIKE
➢ if non-avoidable use LIKE with ‘%’ in last characters
❖ Avoid function calls in WHERE clause
➢ If non-avoidable create index based on function calls
❖ Avoid large IN() statements
➢ use exists
SQL Tuning - SUBQUERY and JOIN

Summary SUBQUERY and JOIN Clause


❖ When joining tables, try to use a simple equality statement in the ON clause
(i.e. a.id = b.person_id). Doing so allows more efficient join techniques to be
used (i.e. Hash Join rather than Nested Loop Join)
❖ Convert subqueries to JOIN statements when possible as this usually allows
the optimizer to understand the intent and possibly chose a better plan
❖ Use JOINs properly: Are you using GROUP BY or DISTINCT just because you are
getting duplicate results? This usually indicates improper JOIN usage and may
result in a higher costs
❖ Avoid correlated subquery where possible; they can significantly increase
query cost (subquery that uses values from the outer query)
❖ Use EXISTS when checking for existence of rows based on criterion because it
“short-circuits” (Boolean values comparison)
SQL Tuning - Explain

Understand EXPLAIN
❖ The EXPLAIN command is by far the must have when it comes to tuning
queries
❖ It tells you what is really going on
❖ Get a nice understanding of the informations given by this command,
know how to use this information, and fix your queries so that they
work faster.
❖ ANALYZE/ANALYSE (PostgreSQL support american and british language)
SQL Tuning - Explain

Understand EXPLAIN
postgres=# explain analyze select * from test;

QUERY PLAN
--------------------------------------------------------------------
Seq Scan on test (cost=0.00..144247.77 rows=9999977 width=72)
(actual time=0.011..4172.020 rows=10000000 loops=1)
Planning time: 0.041 ms
Execution time: 4608.374 ms
(3 rows)
SQL Tuning - Explain

Understand EXPLAIN
❖ Node
➢ logical unit of work (a “step” if you will) with an associated cost and execution time

❖ Seq scan
➢ it’s mean sequential scan (doesn’t use index)

❖ Index Scan
➢ it’s mean scan using index

❖ Cost
➢ cost to get the first row: 0.00
➢ cost to get all rows: 144247.77
➢ the number is “page cost” unit
SQL Tuning - Explain

Understand EXPLAIN
❖ Rows
➢ number of rows in table (cost: planner read sample rows; actual:real read rows)

❖ Width
➢ average width of a row
➢ in bytes

❖ Actual time
➢ actual cost time

❖ Planning time
➢ query planner thinks execution time
❖ Execution time
➢ real execution time
SQL Tuning - Explain

Understand EXPLAIN Using INDEX


postgres=# explain analyze select * from test;

QUERY PLAN
--------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.43..8.45 rows=1
width=72) (actual time=0.015..0.015 rows=1 loops=1)
Index Cond: (i = 999)
Planning time: 0.081 ms
Execution time: 0.040 ms
(4 rows)
SQL Tuning - Explain FORMAT

Understand EXPLAIN FORMAT


❖ Specify the output format, which can be TEXT, XML, JSON, or YAML
❖ Non-text output contains the same information as the text output
format, but is easier for programs to parse
❖ Default TEXT
SQL Tuning - Explain FORMAT
Understand EXPLAIN FORMAT - JSON
postgres=# explain (format json) select * from test where i = 999;

QUERY PLAN
--------------------------------------------------------------------
[ +
{ +
"Plan": { +
"Node Type": "Index Scan", +
"Scan Direction": "Forward",+
"Index Name": "test_pkey", +
"Relation Name": "test", +
"Alias": "test", +
"Startup Cost": 0.43, +
"Total Cost": 8.45, +
"Plan Rows": 1, +
"Plan Width": 72, +
"Index Cond": "(i = 999)" +
} +
} +
]
(1 row)
Question

Vous aimerez peut-être aussi