Vous êtes sur la page 1sur 1

Ten Tips for Writing Efficient SQL

By Robert Bright

Abstract
As a Web Developer at the Ontario Universities Application Center (OUAC), I worked a lot with SQL and database programming. I learned several techniques to write SQL statements that were increased in efficiency. The intention of this presentation is to share the techniques I learned for writing efficient SQL statements so that future co-op student can benefit from this knowledge.

Tip #3
Use OR instead of UNION on the same table When selecting data from a single table that requires a logical or, it is easier to view the process of the query by using an UNION. This method is inefficient because it requires an unnecessary intermediate table. By joining the inner query with the outer query through an OR, it will eliminate the extra sub query and intermediate table. Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find a specific file that belonged to a University. I was tempted to use an UNION to find the exact data, but an OR proved to be more efficient. Before: SELECT hemenbr, hename FROM buma.helpfiles WHERE hemenbr = 5 UNION SELECT hemenbr, henam FROM buma.helpfiles WHERE hename = 'help_address.html' After: SELECT DISTINCT hemenbr, hename FROM buma.helpfiles WHERE hemenbr = 5 OR hename = 'help_address.html'

Tip #7
6) Use IN instead of EXISTS A simple trick to increase the speed of an EXISTS sub query is to replace it with IN. The IN method is faster than EXISTS because it doesnt check unnecessary rows in the comparison. Example: One of the options for the degree listing program I wrote at OUAC was to list all the available degrees at a specific University. So if I were checking for U of Guelph, I would look for all the degrees that were associated with the university number 149. By replacing the EXISTS in the sub query with an IN, I made the query more efficient. Before: select cgrfnbr from category where EXISTS (select cpcgnbr from cgprrel where cpprnbr = 149 ) After: select cgrfnbr from category where cgrfnbr IN (select cpcgnbr from cgprrel where cpprnbr = 149 )

Information about the Employer


The Ontario Universities Application Centre (OUAC), located in Guelph, Ontario, Canada, is a central bureau whose key function is the processing of applications for admission to the provinces universities.

Before After

Before After

Job Description
I worked at OUAC as a Web Developer. I developed web page to improve the usability of the Ontario University application process. I spent the majority of my time creating two internal systems. The first was created with the purpose to allow employees of OUAC to modify the contents of the help files without having to know any programming or HTML skills. The second system created lists of degree programs available at universities. Users were now able to see where a program is taught all at once instead of having to search every university. I developed all the web sites and systems with HTML, JavaScript, SQL, and the IBM scripting language Net.Data.

10

15

20

25

17% Time Reduction

3 Time in ms

36% Time Reduction

Queries

Time in ms

Tip #4
Use EXISTS instead of LEFT JOIN The LEFT JOIN merges the outer query with the inner query and keeps the extra rows from the outer table. The same result can be obtained by using an EXISTS sub query. The will eliminate the need to compare two tables as the inner query acts as a filter when the outer query executes. Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find which Universities had help files associated with them. By using an EXISTS sub query instead of LEFT JOIN, I increased the efficiency of this query by avoiding a table comparison. Before: SELECT merfnbr, mestname FROM buma.merchant LEFT JOIN buma.helpfiles ON merfnbr=hemenbr After: SELECT merfnbr, mestname FROM buma.merchant WHERE EXISTS (SELECT * FROM buma.helpfiles where merfnbr = hemenbr)

Queries

Tip #8
Avoid including a HAVING clause in SELECT statements The HAVING statement is quite useless in a SELECT statement. It works by going though the final result table of the query any parsing out the rows that dont meet the HAVING condition. Instead, you can put the condition inside the query with a WHERE clause. This will be included in the creation of the table and will eliminate having to go back through the results a second time. Example: In the help file tool I created at OUAC, I had to select all the University numbers except for the one that belonged to the test case. So I could cut out that row with a HAVING clause at the end of the statement, but a WHERE proved to be more efficient. Before: select merfnbr from merchant group by merfnbr having merfnbr!=2 After: select merfnbr from merchant where merfnbr!=2 group by merfnbr

Purpose of this Report


On my co-op at OUAC I worked intensively with databases and SQL queries. I learned several techniques to improve the sped and efficiently of the queries. The intention with this report to share this knowledge so current and future co-op students will know how to write better SQL statements. Each technique was tested by running both the original query and improved query ten times each. I recorded the average time of each query to show the speed increase of using the more efficient query.

Queries

Before After

10

15 Time in ms

20

25

30

23% Time Reduction

Before After

10

15

20

25

26% Time Reduction

Queries

Time in ms

Tip #1
Use Column Names Instead of * in a SELECT Statement If you are selecting only a few columns from a table there is no need to use SELECT *. Though this is easier to write, it will cost more time for the database to complete the query. By selecting only the columns you need, you are reducing the size of the result table and in turn increasing the speed of the query. Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to get each files information from the database. By replacing the * in my query with the column names, I increased the speed of the query. Before: SELECT * FROM buma.helpfiles After: SELECT heshnbr, hemenbr, hename, hetitle, hecontent, hefield1, hefield2 FROM buma.helpfiles

Tip #5
Use BETWEEN instead of IN The BETWEEN keyword is very useful for filtering out values in a specific range. It is much faster than typing each value in the range into an IN. Example: While at OUAC I built a small webpage that displayed all possible degrees and their information. Each degree belonged to a grouped category. In the database the category numbers where in a specific range. So I was able to benefit from using a BETWEEN instead having each value inside an IN. Before: SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr IN (508858, 508859, 508860, 508861,508862, 508863, 508864) After: SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr BETWEEN 508858 and 508864

Select all your data at once Each time a query is performed there is the overhead cost of have to open a connection to the database. Having many separate queries that select data from the same table is very inefficient since each query adds its overhead cost to the execution time. By putting all these queries into one, it will reduce the overhead cost significantly. Example: When creating the help file tool at OUAC, I needed to retrieve lots of data on each file. I required the file name, the content, the associated University, etc.. Having these selections as different queries proved to be very inefficient, so I put them together into one statement. Before: select hetitle, hename from helpfileswhere heshnbr=24; select hecontent, hemenbr from helpfiles where heshnbr=24; After: select hetitle, hename, hecontent, hemenbr from helpfiles where heshnbr=24;

Tip #9

Before After

Before After

10

20

30

40

50

34% Time Reduction

Queries

6 Time in ms

10

12

59% Time Reduction

Queries

Before After

10

15 Time in ms

20

25

30

32% Time Reduction

Time in ms

Tip #6
Minimize the number of sub queries Each time a sub query is performed, I new result table must be created and then merged with the outer table. This takes a long time to perform this on a database. So it is important to minimize the amount of sub queries to speed up the results. Example: The degree listing program I made at OUAC was based on a very redundant database. All the relationships were put into one of two tables. So sorting out the information was very difficult. The only method to get the data was to use several sub queries. By simply removing one unnecessary sub query from this statement increased the speed significantly. Before: select cgsdesc, cgrfnbr from category where cgoid='degree' and cgrfnbr IN (select cpprnbr from cgprrel where cpprnbr IN (select cpcgnbr from cgprrel where cpprnbr IN (select prrfnbr from product where prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr IN (select cgrfnbr from category where cgoid IS NULL)) and prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr = 190200)))) After: select cgsdesc, cgrfnbr from category where cgoid='degree' and cgrfnbr IN (select cpprnbr from cgprrel where cpprnbr IN(select cpcgnbr from cgprrel where cpprnbr IN (select prrfnbr from product where prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr = 572191) and prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr = 190200))))

Queries

Tip #2
Use EXISTS instead of DISTINCT The DISTINCT keyword works by selecting all the columns in the table then parses out any duplicates. Instead, if you use sub query with the EXISTS keyword, you can avoid having to return an entire table Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find which Universities had help files associated with them. By using an EXISTS sub query instead of DISTINCT, I increased the efficiency of this query. Before: SELECT DISTINCT hetitle, hename FROM buma.helpfiles h , buma.merchant m WHERE m.merfnbr = h.hemenbr After: SELECT hetitle, hename FROM buma.helpfiles h WHERE EXISTS (SELECT m.merfnbr FROM buma.merchant m)

Remove any redundant mathematics There will be times where you will be performing mathematics within an SQL statement. They can be a drag on the performance if written improperly. For each time the query find a row it will recalculate the math. So eliminating any unnecessary math in the statement will make it perform faster. Example: The degree listing program I created at OUAC has the option to show a specific range on Universities based on their reference numbers. It was easier to show the users a single digit list then add 3000 to get the reference number. But having the addition inside the query was inefficient so I preformed the math outside it. Before: SELECT merfnbr FROM buma.merchant WHERE merfnbr + 3000 < 5000; After: SELECT merfnbr FROM buma.merchant WHERE merfnbr < 2000;

Tip #10

Before After

Before After

48% Time Reduction

Before After

41% Time Reduction

11% Time Reduction

Queries

14

15

16 Time in ms

17

18

Queries

Queries

10

20

30

40

50

10

20 Time in ms

30

40

Summary
The purpose of this report was to share the knowledge I gained about writing efficient SQL from my co-op as a web developer at OUAC. Increasing the speed of queries is very important is web development as web pages are viewed thousands of times per day and therefore a simple increase in speed of a SQL query can create a greater speed in web page viewing.

Time in ms

Vous aimerez peut-être aussi