Robert Bright

Ten Tips for Writing Efficient SQL
By Robert Bright
Abstract
As a Web Developer at the Ontario Universities Application Center (OUAC), I worked a lot with SQL and database programming. I learned several techniques to write SQL statements that were increased in efficiency. The intention of this presentation is to share the techniques I learned for writing efficient SQL statements so that future co-op student can benefit from this knowledge.
Tip #3
Use OR instead of UNION on the same table When selecting data from a single table that requires a logical or, it is easier to view the process of the query by using an UNION. This method is inefficient because it requires an unnecessary intermediate table. By joining the inner query with the outer query through an OR, it will eliminate the extra sub query and intermediate table. Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find a specific file that belonged to a University. I was tempted to use an UNION to find the exact data, but an OR proved to be more efficient. Before: SELECT hemenbr, hename FROM buma.helpfiles WHERE hemenbr = 5 UNION SELECT hemenbr, henam FROM buma.helpfiles WHERE hename = 'help_address.html' After: SELECT DISTINCT hemenbr, hename FROM buma.helpfiles WHERE hemenbr = 5 OR hename = 'help_address.html'
Tip #7
6) Use IN instead of EXISTS A simple trick to increase the speed of an EXISTS sub query is to replace it with IN. The IN method is faster than EXISTS because it doesnt check unnecessary rows in the comparison. Example: One of the options for the degree listing program I wrote at OUAC was to list all the available degrees at a specific University. So if I were checking for U of Guelph, I would look for all the degrees that were associated with the university number 149. By replacing the EXISTS in the sub query with an IN, I made the query more efficient. Before: select cgrfnbr from category where EXISTS (select cpcgnbr from cgprrel where cpprnbr = 149 ) After: select cgrfnbr from category where cgrfnbr IN (select cpcgnbr from cgprrel where cpprnbr = 149 )
Information about the Employer

The Ontario Universities Application Centre (OUAC), located in Guelph, Ontario, Canada, is a central bureau whose key function is the processing of applications for admission to the provinces universities.
Before After
Before After
Job Description
I worked at OUAC as a Web Developer. I developed web page to improve the usability of the Ontario University application process. I spent the majority of my time creating two internal systems. The first was created with the purpose to allow employees of OUAC to modify the contents of the help files without having to know any programming or HTML skills. The second system created lists of degree programs available at universities. Users were now able to see where a program is taught all at once instead of having to search every university. I developed all the web sites and systems with HTML, JavaScript, SQL, and the IBM scripting language Net.Data.
10
15
20
25
17% Time Reduction
3 Time in ms
36% Time Reduction
Queries
Time in ms
Tip #4
Use EXISTS instead of LEFT JOIN The LEFT JOIN merges the outer query with the inner query and keeps the extra rows from the outer table. The same result can be obtained by using an EXISTS sub query. The will eliminate the need to compare two tables as the inner query acts as a filter when the outer query executes. Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find which Universities had help files associated with them. By using an EXISTS sub query instead of LEFT JOIN, I increased the efficiency of this query by avoiding a table comparison. Before: SELECT merfnbr, mestname FROM buma.merchant LEFT JOIN buma.helpfiles ON merfnbr=hemenbr After: SELECT merfnbr, mestname FROM buma.merchant WHERE EXISTS (SELECT * FROM buma.helpfiles where merfnbr = hemenbr)
Queries
Tip #8
Avoid including a HAVING clause in SELECT statements The HAVING statement is quite useless in a SELECT statement. It works by going though the final result table of the query any parsing out the rows that dont meet the HAVING condition. Instead, you can put the condition inside the query with a WHERE clause. This will be included in the creation of the table and will eliminate having to go back through the results a second time. Example: In the help file tool I created at OUAC, I had to select all the University numbers except for the one that belonged to the test case. So I could cut out that row with a HAVING clause at the end of the statement, but a WHERE proved to be more efficient. Before: select merfnbr from merchant group by merfnbr having merfnbr!=2 After: select merfnbr from merchant where merfnbr!=2 group by merfnbr
Purpose of this Report

On my co-op at OUAC I worked intensively with databases and SQL queries. I learned several techniques to improve the sped and efficiently of the queries. The intention with this report to share this knowledge so current and future co-op students will know how to write better SQL statements. Each technique was tested by running both the original query and improved query ten times each. I recorded the average time of each query to show the speed increase of using the more efficient query.
Queries
Before After
10
15 Time in ms
20
25
30
23% Time Reduction
Before After
10
15
20
25
26% Time Reduction
Queries
Time in ms
Tip #1
Use Column Names Instead of * in a SELECT Statement If you are selecting only a few columns from a table there is no need to use SELECT *. Though this is easier to write, it will cost more time for the database to complete the query. By selecting only the columns you need, you are reducing the size of the result table and in turn increasing the speed of the query. Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to get each files information from the database. By replacing the * in my query with the column names, I increased the speed of the query. Before: SELECT * FROM buma.helpfiles After: SELECT heshnbr, hemenbr, hename, hetitle, hecontent, hefield1, hefield2 FROM buma.helpfiles
Tip #5
Use BETWEEN instead of IN The BETWEEN keyword is very useful for filtering out values in a specific range. It is much faster than typing each value in the range into an IN. Example: While at OUAC I built a small webpage that displayed all possible degrees and their information. Each degree belonged to a grouped category. In the database the category numbers where in a specific range. So I was able to benefit from using a BETWEEN instead having each value inside an IN. Before: SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr IN (508858, 508859, 508860, 508861,508862, 508863, 508864) After: SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr BETWEEN 508858 and 508864
Select all your data at once Each time a query is performed there is the overhead cost of have to open a connection to the database. Having many separate queries that select data from the same table is very inefficient since each query adds its overhead cost to the execution time. By putting all these queries into one, it will reduce the overhead cost significantly. Example: When creating the help file tool at OUAC, I needed to retrieve lots of data on each file. I required the file name, the content, the associated University, etc.. Having these selections as different queries proved to be very inefficient, so I put them together into one statement. Before: select hetitle, hename from helpfileswhere heshnbr=24; select hecontent, hemenbr from helpfiles where heshnbr=24; After: select hetitle, hename, hecontent, hemenbr from helpfiles where heshnbr=24;
Tip #9
Before After
Before After
10
20
30
40
50
34% Time Reduction
Queries
6 Time in ms
10
12
59% Time Reduction
Queries
Before After
10
15 Time in ms
20
25
30
32% Time Reduction
Time in ms
Tip #6
Minimize the number of sub queries Each time a sub query is performed, I new result table must be created and then merged with the outer table. This takes a long time to perform this on a database. So it is important to minimize the amount of sub queries to speed up the results. Example: The degree listing program I made at OUAC was based on a very redundant database. All the relationships were put into one of two tables. So sorting out the information was very difficult. The only method to get the data was to use several sub queries. By simply removing one unnecessary sub query from this statement increased the speed significantly. Before: select cgsdesc, cgrfnbr from category where cgoid='degree' and cgrfnbr IN (select cpprnbr from cgprrel where cpprnbr IN (select cpcgnbr from cgprrel where cpprnbr IN (select prrfnbr from product where prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr IN (select cgrfnbr from category where cgoid IS NULL)) and prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr = 190200)))) After: select cgsdesc, cgrfnbr from category where cgoid='degree' and cgrfnbr IN (select cpprnbr from cgprrel where cpprnbr IN(select cpcgnbr from cgprrel where cpprnbr IN (select prrfnbr from product where prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr = 572191) and prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr = 190200))))
Queries
Tip #2
Use EXISTS instead of DISTINCT The DISTINCT keyword works by selecting all the columns in the table then parses out any duplicates. Instead, if you use sub query with the EXISTS keyword, you can avoid having to return an entire table Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find which Universities had help files associated with them. By using an EXISTS sub query instead of DISTINCT, I increased the efficiency of this query. Before: SELECT DISTINCT hetitle, hename FROM buma.helpfiles h , buma.merchant m WHERE m.merfnbr = h.hemenbr After: SELECT hetitle, hename FROM buma.helpfiles h WHERE EXISTS (SELECT m.merfnbr FROM buma.merchant m)
Remove any redundant mathematics There will be times where you will be performing mathematics within an SQL statement. They can be a drag on the performance if written improperly. For each time the query find a row it will recalculate the math. So eliminating any unnecessary math in the statement will make it perform faster. Example: The degree listing program I created at OUAC has the option to show a specific range on Universities based on their reference numbers. It was easier to show the users a single digit list then add 3000 to get the reference number. But having the addition inside the query was inefficient so I preformed the math outside it. Before: SELECT merfnbr FROM buma.merchant WHERE merfnbr + 3000 < 5000; After: SELECT merfnbr FROM buma.merchant WHERE merfnbr < 2000;
Tip #10
Before After
Before After
48% Time Reduction
Before After
41% Time Reduction
11% Time Reduction
Queries
14
15
16 Time in ms
17
18
Queries
Queries
10
20
30
40
50
10
20 Time in ms
30
40
Summary
The purpose of this report was to share the knowledge I gained about writing efficient SQL from my co-op as a web developer at OUAC. Increasing the speed of queries is very important is web development as web pages are viewed thousands of times per day and therefore a simple increase in speed of a SQL query can create a greater speed in web page viewing.
Time in ms

Robert Bright

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Robert Bright

Transféré par

Droits d'auteur :

Formats disponibles

Ten Tips for Writing Efficient SQL

Information about the Employer

17% Time Reduction

36% Time Reduction

Purpose of this Report

23% Time Reduction

26% Time Reduction

34% Time Reduction

59% Time Reduction

32% Time Reduction

48% Time Reduction

41% Time Reduction

11% Time Reduction

Vous aimerez peut-être aussi