Vous êtes sur la page 1sur 10

10 tips about sql server that every developer should know

working as dba to a corporative environment for over 5 years, i have seen things that even
the least experienced sql server developer would believe. for example, every developer who
programs for sql server knows, or at least should know, that one of the main requirements
to assure the good performance of his query is to analyze its execution plan and to assure
that it is making adequate use of the table’s indexes.
however, what i have evidenced in my day-to-day is that still many are the developers that
so much as worry about the table’s indexes. what one notices, is that at least at first, the
developer is more worried about making his query work and in delivering the data to the
user.
as result, in medium term what you have is high waste of the server’s resources and the so
hated delay in the application. it is clear that there exist many other points that lower an
application’s performance as, for example: outdated data access statistics, connections’
blocks (most of the times, due to the lack of indexes), excessive use of cursors etc.
but you can be certain: the bad use of index and even their absence is the greatest cause of
sql server applications’ performance problems. in this article i will present 10 important tips
that every developer must know when working with the sql server.
tips on how to analyze the execution plan, methods for the substitution of cursors, use of
sub-queries, use of indexed column in the where clause, at end, tips that certainly will help
you to gain a greater benefit from the sql server.
well, given this small introduction, let us move on to what matters.
1. i always analyze the queries execution plan.
as i have said previously, the execution plan analysis is one of the main requirements to
assure the query’s good performance. the execution plan describes the path used by the sql
server optimizer to arrive at the requested data and shows which operations had been
executed during the query processing. in the execution plan, each operation is represented
by an icon and a set of arrows that unite these icons. this makes it easy to understand a
query‘s performance characteristics.
to see a query’s execution plan, enter the sql server 2000’s query analyzer or the sql server
2005’s query editor, write the query and type ctrl+l or select the menu option query>
display estimated execution plan in the menus bar.
you will notice that the query will not be actually executed, that will only occur before the
creation of the execution plan. when analyzing an execution plan, you must have in mind
that it is generated based on the existing statistics for the table or indexes used by the
query;therefore it is very important that when analyzing the execution plan the objects
access statistics are updated.
if the statistics are not updated, the execution plan will be generated on top of inconsistent
data that may not reflect reality. you must always update the statistics after performing
operations that put into motion great bundles of data or the creation and alteration of
indexes.
in figure 1 we have the example of the execution plan of a query executed over the
tablespublishers and title of the sql server 2000’s pubs database.
figure 1. example of an execution plan
observe that each icon represents a specific operation and the arrows indicate the path to
be followed. the red arrow indicates that the analysis must always be performed from right
to left and from top to bottom.

execution plan
the execution plan describes the path used by the sql server’s optimizer to reach the
requested data and shows which operations were executed along the query’s processing.

2. when analyzing the execution plan, start by searching for operations with high
consumption.
during the execution plan analysis, begin looking for operations that have a high percentage
of consumption. looking for operations with high consumption allows arranging in order of
priority which problem should be “attacked” first. amongst the operations that have greater
consumption and that, therefore, must be prevented we have:
• table and index scan operations;
• operations that have arrows too “thick”;
• bookmark lookups operations;
• sort operations.
table and index scan operations
table scans and index scans operations are slow operations and that generate high server
consumption. this because these are operations that navigate through all the lines of the
table or the index, performing a sequential sweeping and returning the lines that satisfy
the whereclause (assuming that you use a where clause).
it is true that depending on the size of the table, the amount of lines being returned and the
quality of the filter rule, table scan may not point a problem, but mainly, when we speak of
great tables, the table scan is the worst of all the operations and indicates that the table
does not have index or, if it has, it is not being used adequately by the query.
whenever you find a table scan in your execution plan, do not restrain from investigating it.
theindex scan and clustered index scan perform a sequential sweeping in a table’s index
pages. since they act on an index, they are better than the table scan, but they also
deserve an investigation. this because in general if you have index scan, then a great
amount of data is being returned and in the majority of the times you do not need all this
data.
the scans, in their majority, are solved through the modification or creation of proper
indexes. some solutions also include modifying the queries in such a way as to be more
selective, that is, to use the where clause to filter at the maximum possible the returned
registers.
figure 2 shows the graphic representation of the table, clustered index scan and index
scanoperators.

figure 2. graphic representation of the table, clustered index scan and index scan operators
operations that have arrows too “thick”
the arrows are not operators: they are simply used to unite an operator to another. through
the arrows we have an estimate of the amount of lines affected by the operation since its
thickness is directly related to the amount of lines returned by the operation.
the bigger the thickness of the arrow, the greater is the amount of lines involved in the
operation or the amount of lines passed from an operator to another. in order for you to see
the quantity and size estimate of the affected lines, it is enough to place the cursor over the
arrow.
when analyzing the execution plan, always give a special attention to the thickest arrows,
for a very thick arrow can indicate a high operation of i/o and, consequently, containment in
the server’s subsystem disk. another important point is that most of the times, the very
thick arrows are associated to a table scan.
to solve this kind of problem we must once more make use of the where clause to filter the
returned data and make the arrow as thin as possible. if the arrow is associated to a table
scan, first analyze the latter, because probably when solving the table scan you will also
solve the arrow’s thickness. the suggestion here is: avoid obtaining more lines than
necessary.
in figure 3 we have a query‘s execution plan (modified employee table of
the pubs database) which returns almost 700 thousand records. observe that
the query does not use the whereclause.

figure 3. execution plan for a query without where clause


bookmark lookups operations
the bookmark lookup operator takes place when the index can be used to satisfy the
search criterion, but it does not have all the data requested by the query. normally it occurs
together with an index scan when the query requires information on columns that are not
part of the index key. in this scenario, look for bookmark lookup that has a high
consumption percentage.
in figure 4 we can observe that, to obtain the data requested by the query, the sql server
executed a bookmark lookup operation which consumed 41% of the query’s total time of
execution.
figure 4. execution plan with a bookmark lookup
this happened because the fname and lname columns are not part of the index key, which is
composed only of the hire_date column, and with that the sql server needs to access the
table’s data pages in order to obtain the data regarding fname and lname.
if the bookmark lookup operation cost is too high, check if a cluster index or a non-
cluster index, composed by the researched columns, can be used. in figure 5, the creation
of a cluster index composed by the fname and lname columns solved the problem.

figure 5. execution plan after the creation of a cluster index


a word of advice: always when possible avoid the bookmark lookup operation in your query.
this because, although the bookmark lookup which deals with small amounts of data is not a
problem, this operation in large amounts of data increases the i/o rate and consequently
damages the query performance.
sorting operations
the sort operation ajust all the lines in an ascending or descending order, depending on
theorder by clause of its query. sort operations, besides using the tempdb system database
for a temporary storage area, also add a great i/o rate to the operations.
therefore, if you are used to seeing the sort operator frequently in its queries and this
operator has a high consumption operation, consider removing the mentioned clause. on
the other hand, if you know that will always organize your query by a specific column,
consider indexing it.
in the create index command you can determine the ordering direction (asc or desc) of a
particular index. figure 6 presents a sort operation consuming 23% of the query’s total
execution time.
figure 6. sort operation with 23% consumption
3. avoid the use of cursors and whenever possible substitute them for “while”.
the great problem in the use of cursors is that these, by nature are slow and consume a
great deal of the server resource. this happens because the relational databases are
optimized to work with sets of records.
each set of records is known as result set and treated as a single unit. as example, the set
of records returned by a select statement consists of all the lines that satisfy
the where clause condition.
the cursor goes in the opposite direction of this concept, since it was developed thinking
about the work line by line. that is, you use it to navigate line by line inside a set of records
or a result set returned by a select statement.
as consequence of this use, we have a great volume of packages being sent through the
network, high compilation time and parse of the fetch statements, blocking of connections
due to the locks in tables or records, at last, high consumption of the server’s resources and
low performance of its application.
in the face of this, some methods emerged using transact sql which can be used to replace
the use of the cursors. following, i will be presenting two of these methods.
first, so that you can understand how the cursor is used, in listing 1 we have a very simple
example of cursor that navigates line by line in the authors table of the pubs database,
displaying the information of the id, last name and first name fields.
listing 1. example of cursor utilization
1. declare @au_id varchar(15)
2. declare @au_fname varchar(15)
3. declare @au_lname varchar(15)
4. declare cur_authors cursor
5. for select au_id, au_fname,au_lname from authors
6. open cur_authors
7. fetch next from cur_authors into @au_id,@au_fname,
@au_lname
8. while @@fetch_status=0
9. begin
10. select @au_id,@au_fname,@au_lname
11. fetch next from cur_authors into @au_id,
@au_fname,@au_lname
12. end
13. close cur_authors
14. deallocate cur_authors
in the example, from lines 1 to 3 we have the statement of the variables that will be used to
store the data of the fields returned by select. it is important to observe that these variables
must have the same data type as the columns of the table
in lines 4 and 6, we have the declaration and opening of the cursor itself. observe that
theresult set generated at the opening of the cursor will include all the records and only the
columns au_id, au_fname and au_lname of the authors table. in line 7, fetch next takes
charge of taking the next record and filling in the variables with the data obtained from the
record.
from lines 8 through 12, we have the loop which will keep on being executed for as long as
there are still records (@@fetch_status=0). in practice, line 8 verifies if still it has records, if
there are, in line 10 “prints” the content of the variables on the screen and in line 11, takes
the next record again and fills in the variables with the data obtained in the record.
in line 13, we have the closing of the cursor and in line 14 the release of the memory used
by the cursor takes place. given the example, we will see two methods that can be used to
accomplish the same task, however without the use of cursors.
the first method, which is presented in listing 2, makes use of temporary table and
the topclause. with this method, you create a snapshot of the desired information throwing
the result of select in a temporary table.

listing 2. method to replace cursor using temporary table


1. declare @au_id char(11)
2. select au_id, au_fname,au_lname
into #tb_tmp_authors from authors
3. select top 1 @au_id = au_id from #tb_tmp_authors
4. while @@rowcount <> 0
5. begin
6. select au_id, au_fname,au_lname
8. from #tb_tmp_authors where au_id = @au_id
9. delete #tb_tmp_authors where au_id = @au_id
10. select top 1 @au_id = au_id from #tb_tmp_authors
11. end
12. drop table #tb_tmp_authors

in the method of listing 2, line 1 simply declares a variable to store the content of
the au_idcolumn. in line 2, we have the same select that was used in the listing’s 1 cursor
declaration. the difference is here that instead of throwing the result of the select into a
cursor, this result is being thrown into a temporary table called #tb_tmp_authors.
after the temporary table load, we can then work with it to deal with the registers line by
line. observe that in line 3 the top clause is used with value 1. the top 1 clause assures that
only the first record of the temporary table will be returned by select and consequently we
will have the value of the au_id column from the first record stored in the variable @au_id.
in line 4 the while command is used to make the loop and to check the value of
the@@rowcount global variable. this is a system variable which is automatically filled with
the amount of records affected by the select executed in line 3. since we use
the top 1 in theselect, this will always affect a record at a time and
the @@rowcount variable will always be 1 until the temporary table is empty.
from lines 6 to 10 are then performed all the desired processing, in this in case we only
print the data in the screen, and observe that in line 9 the record of the temporary table
whose au_id was obtained in the select of line 3 is excluded.
this will make it so that the second record of the temporary table becomes the first. in line
10 the same select of line 3 is again executed taking the value from the first record of the
columnau_id. and thus the process will continue until the temporary table is empty and with
this@@rowcount be equal to 0. at the end of the processing, line 12 excludes the temporary
table.
the second method, which is presented in listing 3, does not use a temporary table, but
uses the min() function to take a record at a time from the authors table.

listing 3. method to replace cursor using the min() function


1. declare @au_id char(11)
2. select @au_id = min(au_id) from authors
3. while @au_id is not null
4. begin
5. select au_id, au_fname,au_lname from authors
6. where au_id = @au_id
7. select @au_id = min( au_id ) from authors
where au_id > @au_id
8. end

since the method uses the min() function it is necessary to guarantee that the verification is
made over a column that is single and crescent. this will guarantee that new lines will
always have a bigger identifier than the identifier of the line being processed.
in this example, in line 1 we have the variable declaration which will store the content of
theau_id column. in line 2, the select obtains from the authors table the record that has the
lesser value for the au_id column and stores this value in the @au_id variable.
in line 3 the while command is used to make the loop and to verify if the @au_id variable is
not null, for when it is null it means that no more records to be processed exist and with
this we leave the loop. being the @au_id variable different than null, we then enter
the loop and from lines 5 to 7 we perform the desired processing.
in this case we only print the data in the screen and in line 7 we obtain a new record where
the value of the au_id column is larger than the value already stored in the @au_id variable.
and thus the processing continues until it reaches the last record in the table.
as we have seen, not always do we need to use cursor to process our data inside the sql
server. if you execute the three examples mentioned here you will see that the result will be
the same.
4. substitute the union operator by union all whenever possible.
when you are using the union operator to combine the result of two queries, keep in mind
that this perform a select distinct in the final result to remove possible duplicate records,
even if there are no duplicate records.
before this, the advice is that not having the possibility of duplicate records or if there are
no problems for the application that the final result presents duplications, use the union
alloperator. since this operator does not execute select distinct in the final result, it uses
less sql server resources and therefore, improves the query performance.
in figure 7 we have two queries that perform the same operation over the orders table of
thenorthwind database, one using the union operator and the other, the union all.
observe that the query with union all will display all the records including the duplicates. but
this consumes less of the server’s resource for not performing the select distinct in the final
result.
figure 7. example of queries using the union and union all operators
5. substitute sub-queries by joins.
many transact sql instructions that make use of sub-queries may be rewritten using joins. it
is true that many of the times you will not have performance benefits when using sub-
queries orjoins, but in some cases where, for example, the existence of a value needs to be
verified, the use of joins will produce better results.
so being, whenever possible look to substitute your sub-queries for joins. in listing 4 we
have two select instructions being one written with sub-query and the other with join.

listing 4. select instructions using sub-query and join


-- select instruction using sub-query
select productid,supplierid, productname
from products where supplierid in
(select supplierid from suppliers
where (country = 'brazil'))

-- select instruction using join


select prd.productid, prd.supplierid, prd.productname
from products prd inner join suppliers sup
on prd.supplierid = sup.supplierid
where sup.country = 'brazil'

observe that when being executed both will produce the same result: a list with all the
products of suppliers in brazil.
6. in the where clause do not use indexed column in functions.
the simplest way to make a column not able to be indexed, is to put this column in a
function! in the sql server, the use of the substring function in the where clause is every
common, however, what very few know is that when you place an indexed column inside a
function, the sql server ends up not using the index in a suitable fashion and many times
does not even use it at all.
in these situations, the best thing to do is to move the function to the other side of the
equation in the where clause or if possible not use it at all. in figure 8 we have an example
of how the use of function in indexed column in the where clause can prevent the sql server
from using the index correctly.

figure 8. example of queries using the substring function and the like command
the two queries presented in figure 8 have as objective to obtain all employees whose first
name starts with the characters “ma”. observe that in the first query
the substring function is used to break the fname column taking only the first two
characters and comparing them to the “ma” string.
since the sql server never knows which will be the characters to be researched, the process
is performed for each of the table’s lines, it ends up performing an index scan, sweeping all
the pages of the index in a sequential way.
in the second query the function was subtracted by the like command. in this case, since
the indexed column is not affected, the sql server manages to use the index in a suitable
fashion performing an index seek in the index pages.
7. whenever possible try to use operators that are capable of being indexed.
similar to the problem of the use of indexed columns in functions, there is also a set of
operators which when used, they can prevent the sql server from using the index in a
suitable fashion. these are known as operators incapable of being indexed.
the positive operators are generally capable of being indexed: =, >, >=, <,
<=, between and likewhen used in the: like ‘word%’ manner. the negative operators are
generally incapable of being indexed: <>, not, not exists, not in, not like and like when
used in the: like ‘%word’ manner.
8. when in need of obtaining an amount of records form a table, avoid using the
select count(*).
whenever we need to obtain the amount of records from a table, the first t-sql instruction
that comes to the mind is: “select count (*) from table”.
the problem with this instruction is that most of the times it performs a table or index
scan to return the amount of records in the table. for large tables this is a synonymous of
slow query and high consumption of server resources.
a simpler way to perform this same task without causing impact is using the systems table
calledsysindexes. this table has a column called rows which stores the total amount of
records for each table of your database.
being so, whenever possible use the t-sql instruction that follows, to obtain the amount of
records in the table.

select rows from sysindexes


where id = object_id('table_name') and indid<2
since the sysindexes table does not have a column with the name of the tables, but only
theirids, the instruction uses the object_id() function so that the sql server may, by the
name of the table, identify its respective id inside the sysindexes table.
9. always use the set nocount on inside your stored procedures.
this is a “best practice” which i rarely see developers use. certainly when running t-sql
instructions such as select, insert, update and delete you must have already seen the “nn
row(s) affected” message making part of the result of your query.
maybe you do not know, but this apparently harmless message may generate a great
impact in the performance of your stored procedures. this because when you execute a sp
which has several t-sql instructions, this message is sent to the client for every instruction
inside the sp, which ends up generating an unnecessary network traffic.
the set nocount option disables the sending of these messages. with this, the sps that
possess several t-sql instructions, may present significant performance improvement once
the network traffic will be greatly reduced.
10. when creating composed index, order the columns in the index in a way as to
satisfy the where clause of most of your queries.
one thing that must always be kept in mind when working with composed (indexes with two
or more of the table’s columns), is that the index will only be used by the query if the first
column of the key of a composed index is specified in the where clause.
therefore, when working with composed index, the order of the columns in the index is very
important. in order for you to understand better, let us move to an example: assume that
your database has a table called tb_employee and that this table has an index composed by
the columns last_name, first_name in that respective order. when analyzing the execution
plan of aquery that has the where clause as:

where last_name='pinheiro'

you will see that the index was adequately used by the sql server’s consultation optimizer,
but when analyzing the plan from a consultation that has the where clause as:

where fist_name='nilton'

you will see that the index was not used. therefore, when using composed indexes, make
sure that the where clause always has the first column of the index.
conclusion
as we have seen, countless are the resources and techniques that we may use to extract a
better profit of the sql server. in general, when we work with performance problems
in query, one of the basic points is to assure that the tables have indexes and mainly assure
that thequeries are making proper use of these indexes.
the analysis of the execution plan may help in this. another important point is to guarantee
that the statistics are always kept updated. remember that the sql server optimizer uses the
statistics to make his decisions. greetings and see you next time.

Vous aimerez peut-être aussi