Vous êtes sur la page 1sur 11

3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

Baron Schwartz's Blog

How to exploit MySQL index


optimizations
Tue, Jul 4, 2006 in Databases

Ive written a lot recently about MySQL index and table


structure, primary keys, surrogate keys, and related
optimizations. In this article Ill explain how MySQLs
index structures enable an extremely important query
optimization, and how that differs between storage
engines. Ill also show you how to know and predict when
the optimization is triggered, how to design tables and
queries so itll be used, and how to avoid defeating it with
poor practices. Plus, Ill peek a bit into InnoDB internals
to show you whats going on behind the scenes.

A review of MySQLs primary and secondary indexes


You need to understand how MySQLs indexes work, and
how InnoDBs are different from other storage engines,
such as MyISAM, because if you dont, you cant design
tables effectively.

The InnoDB storage engine creates a clustered index for


every table. If the table has a primary key, that is the
clustered index. If not, InnoDB internally assigns a six-
byte unique ID to every row and uses that as the clustered
index. (Moral of the story: pick a primary key of your own
dont let it generate a useless one for you).

All indexes are B-trees. In InnoDB, the primary keys leaf


nodes are the data. Secondary indexes have a pointer to
the data at their leaf nodes. A picture is worth a thousand
words, so heres a diagram of the table structure Ill use
later on in this article (click through to see it full size):

s earc h t his webs it e


https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 1/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

MyISAM has no clustered index, so the data isnt


physically ordered by any index (its in insertion order),
but in InnoDB, the rows are physically ordered by the
primary key. That means there can be page splits as rows
are inserted between other rows if there are too many
rows to t on a page, the page has to be split. MyISAM
doesnt have that problem, because rows dont get stuffed
between other rows (they are added at the end), so a
secondary indexs leaf nodes always point directly to the
row in the table. In fact, theres no functional difference
between primary and secondary keys in MyISAM. A
MyISAM primary key is simply a unique index named
PRIMARY.

Heres a picture of the equivalent table structure, using


the MyISAM engine. Notice how different it is from
InnoDB! This is the same table, its just a different storage
engine.

Why doesnt InnoDB just point to the rows like


MyISAM? If InnoDB used that strategy, it would have to
rewrite all the secondary indexes at every page split, when
s earc h t his webs it e
https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 2/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

the rows get moved to a different location on disk. To


avoid that cost, InnoDB uses the values from the primary
key as its secondary indexs leaf nodes. That makes the
secondary indexes independent of the physical order of
the primary key, but the pointer isnt a pointer directly
to the row as in MyISAM. It also means secondary index
lookups are more expensive than primary key lookups,
because any secondary index lookup only results in a tuple
that can be used to navigate the primary key double
work. MyISAM doesnt have that issue. Of course, it
doesnt have rows in index order, either; and the primary
key might be deeper. Its a trade-off.

Secondary index optimizations


So theres a cost to secondary indexes in InnoDB. Theres
an optimization too. Once a query navigates to the leaf
node of a secondary index, it knows two things: the values
it used to navigate the index, and the primary key values of
that row in the table.

For example, suppose I have a table structured like this:

createtableapples(
varietyvarchar(10)primarykey,
notevarchar(50),
priceint,
key(price)
)engine=InnoDB

insertintoapplesvalues
('gala','hello',5),
('fuji','hello',6),
('limbertwig','hello',8),
('reddelicious','hello',3),
('pippin','hello',8),
('grannysmith','hello',11),
('roma','hello',6)

Note only the gala row has a price of 5. Now suppose I


issue the following query:

selectvarietyfromappleswhereprice=5

s earc h t his webs it e


https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 3/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

The query takes the value 5 and navigates the price


index. When it gets to the leaf node, it nds the value
gala, which it can use to navigate the primary key. But
why does it need to do that? It already has the value it was
looking for!

In fact, if the query only refers to values in the secondary


and clustered index, it doesnt need to leave the secondary
index. If you like fancy lingo, the index covers the query,
so it is a covering index or index cover.

This is a fantastic optimization. It means each secondary


index is like another table, clustered index- rst. In this
example, the secondary index is like a table containing
just price and variety , clustered in that order (refer
again to the diagrams above).

In MyISAM, the dont leave the index optimization can


be used too, but only if the query refers only to values in
the index itself, because MyISAM indexes dont have any
PK values at their leaf nodes. A MyISAM index cant be
used to nd any additional data without following the
pointer to the row itself. Again, its a trade-off.

How to know when the optimization is used


Theoretically, the optimization can be used anytime a
query only uses values from the clustered index and a
secondary index in InnoDB, or only uses values from the
index itself in MyISAM. That doesnt mean the query will
use that index, though. For a variety of reasons, the query
might use some other index. To nd out for sure,
EXPLAIN the query. If the Extra column includes the
text Using index, the optimization is being used.

How to design indexes for this optimization


Once you understand how indexes work, you can make
deliberate decisions about indexes. Here is a methodical
approach to designing indexes.

Begin with a table and the data it needs, but without any
indexes except those designed to constrain the data to
s earc h t his webs it e
https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 4/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

valid values (primary and unique indexes). Next, consider


the queries that are issued against the table. Is it queried
ad-hoc, or do certain types of queries happen repeatedly?
This is very important to know.

Before you start, consider the size of the table and how
much it is used. You should put your optimization effort
where it is most needed. If one 5-minute query runs once
a day and you know it should be possible to optimize it to
5 seconds, thats 4 minutes and 55 seconds saved. If
another query issued every minute takes 5 seconds and
you know it should be possible to run it in a few
milliseconds, thats about 7,000 seconds saved. You should
optimize the second query rst. You should also consider
carefully-designed archiving jobs to get those tables as
small as possible. Smaller tables are a huge optimization.

Now, back to the index design discussion. If the table is


queried ad-hoc all the time, you need to create generally
useful indexes. Most of the time you should examine the
data to gure out what they should be. Pretend youre
optimizing the apples table above. This table probably
does not need an index on the note column. Look at its
contents every row just says hello. Indexing that
would be a total waste. Plus, it just seems reasonable that
you want to look at the note , but not lter by it. On the
other hand, its very reasonable that youd want to nd
apples by price. The price index is probably a good
choice.

On the other hand, if you know theres a certain query that


happens all the time and needs to be very fast, you should
consider specially optimized indexes. Suppose these two
queries each run 50 times a second:

selectvarietyfromappleswhereprice=?
selectnotefromappleswhereprice=?

These queries deserve a close look. The optimization


strategy will depend on the table size and the storage
engine.
s earc h t his webs it e
https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 5/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

If you are using the InnoDB engine, the rst query is


already optimized as weve seen above. It will use the
price index and not even look at the table itself. If
youre using the MyISAM engine, you need to consider
how large the table is, and therefore how large an index on
(price,variety) would be. If the table is very large, for
example, if there are a bunch of large VARCHAR columns
in it, that index might be signi cantly faster than all the
bookmark lookups required to nd the variety column
for each row found in an index that only contains the
price column.

The second query is trickier to optimize, because it really


depends on how large the table is. If the table is very
large, and has lots of other columns as I mentioned in the
previous paragraph, it might make sense to create an
index on (price,note) . This is where careful testing is
needed. I will explain how to do that testing in an
upcoming article. It is non-trivial in MySQL,
unfortunately.

The general strategy is as follows:

1. For InnoDB, put the columns in the WHERE clause


rst in the index, then add the columns named in
the SELECT clause at the end, unless they are
included in the primary key.
2. For MyISAM, put the columns in the WHERE clause
rst in the index, then add the columns named in
the SELECT clause at the end.

How to write queries that dont suck


Ive noticed many people have a tendency to write SELECT
*FROM... queries. If you dont need all the columns,
dont select all the columns, because it can make the
difference between a fast and a slow query. If you only
select the columns you need, your query might be able to
use one of the optimizations Ive just explained. If you
select every column and the query uses a secondary index,
theres no way to do that, and the query will have to
s earc h t his webs it e
https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 6/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

wander around indexes nding the rows it needs, then do


other operations to get the actual values from the rows.

Of course, if you only need a few columns, it can also be a


lot less data not to select all the columns you dont need.
Getting that data off the disk and sending it to whatever
asked for it is signi cant overhead. Dont do it unless you
need to.

Other InnoDB index design considerations


Since InnoDB secondary indexes already contain all
columns from the primary key, theres no need to add
them to the secondary index unless the index needs them
at the front of the index. In particular, adding an index on
(price,variety) to the apples table above is
completely redundant. And in tables where the primary key
is several columns and its desirable to have the table
clustered two ways by using the indexes as Ive
explained, not all of the columns need to be added to
additional indexes. Indexes need to be designed very
carefully to avoid causing a bunch of extra overhead.
Every index adds a cost to the table, and its really
important to avoid indexes that add cost but no bene t.

Suppose you added an index on (price,variety) to the


apples table anyway. You might think the variety
column can just be optimized out of the internal nodes,
since the values are already at the leaf nodes. It cant,
because the primary key values are only at the leaf nodes,
not in the internal nodes, and they cant be optimized out
of the internal nodes because theyre needed for
navigating the index. Again, adding that column to the
end of the index will just make the index larger, but result
in the query knowing nothing it didnt already know and
thats useless.

I want to point out that its not always possible to design


indexes so this optimization can be used! It is not
necessarily a good design goal to make sure every query
can be satis ed without leaving the indexes. In fact, its
s earc h t his webs it e
https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 7/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

unrealistic. But in special cases, it may be possible and


worth doing.

Another InnoDB optimization


Heres another neat optimization: a tiny index might be
used unexpectedly. For example,

createtablesomething(
idbigintnotnullauto_incrementprimaryke
y,
is_somethingtinyintnotnull,
othercol_1bigintnotnull,
othercol_2bigintnotnull,
othercol_3bigintnotnull,
index(is_something)
)

is_something is a 10 indicator of whether something is


true about the row. Normally Id say an index on that is a
waste of disk and CPU, because its not selective enough
for the query optimizer to use it, assuming theres an
equal distribution of ones and zeroes. But the fact that its
a very small value is important for some queries. For
example, selectsum(id)fromsomething will scan the
is_something index because its the smallest available.
Its internal nodes only have one-byte tinyint values,
and the leaf nodes have a tinyint and an 8-byte
bigint . Thats much smaller than the clustered index,
which has 8-byte values in the internal nodes, and 33
bytes at each leaf.

Proof of InnoDBs automatic clustered index


I said every InnoDB table gets a 6-byte internal clustered
index if it has no primary key. Heres a neat way to see
that in action. I created a table like so:

createtabletest(aint,bint,cint)engine=In
noDB

insertintotestvalues(1,1,1),(2,2,2)

s earc h t his webs it e


https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 8/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

I started a transaction and got an exclusive lock on it, then


started another transaction on a different connection and
tried to update that table:

connection1:
settransactionisolationlevelserializable
starttransaction
select*fromtest

connection2:
settransactionisolationlevelserializable
starttransaction
updatetestseta=5

The query blocked and waited for a lock to be granted.


Then I issued SHOWENGINEINNODBSTATUS on another
connection. The transaction information shows the lock
on the internally generated index:

TRANSACTION081411,ACTIVE1410sec,process
no8799,OSthreadid1141414240startingindex
read
mysqltablesinuse1,locked1
LOCKWAIT2lockstruct(s),heapsize1216
MySQLthreadid4,queryid194localhostxaprb
Updating
updatetestseta=5
TRXHASBEENWAITING9SECFORTHISLOCK
TOBEGRANTED:
RECORDLOCKSspaceid0pageno131074nbits72
index`GEN_CLUST_INDEX`oftable`test/test`trx
id081411lock_modeXwaiting
Recordlock,heapno2PHYSICALRECORD:n_fields
6compactformatinfobits0
0:len6hex000000018a02asc1:len
6hex000000013e0aasc>2:len7hex
80000000320110asc23:len4hex800
00001asc4:len4hex80000001asc
5:len4hex80000001asc

Notice the lock on the index called GEN_CLUST_INDEX .


Notice also the number of elds ( n_fields ) in the lock
struct: two more than the number of columns in the table.
The rst eld in the index is the internally generated
unique value, and it is 6 bytes as I said above.

If there is a primary key on a , its a different story:


s earc h t his webs it e
https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 9/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

TRANSACTION081456,ACTIVE17sec,processn
o8799,OSthreadid1141680480startingindexr
ead
mysqltablesinuse1,locked1
LOCKWAIT4lockstruct(s),heapsize1216
MySQLthreadid9,queryid277localhostxaprb
Updating
updatetestseta=5wherea=1
TRXHASBEENWAITING6SECFORTHISLOCK
TOBEGRANTED:
RECORDLOCKSspaceid0pageno131076nbits72
index`PRIMARY`oftable`test/test`trxid081
456lock_modeXlocksrecbutnotgapwaiting
Recordlock,heapno2PHYSICALRECORD:n_fields
5compactformatinfobits0
0:len4hex80000001asc1:len6he
x000000013e27asc>'2:len7hex80000
000320110asc23:len4hex80000001
asc4:len4hex80000001asc

Now the lock is on the index called PRIMARY , there are


only 5 elds in the lock structure, and the rst one is 4
bytes instead of 6. Fields with the value 2 have the hex
value 80000001. When the primary key is a column, that
eld comes rst in the lock structure.

These examples prove that InnoDB adds a hidden


column to your tables when you dont create a primary
key. Maybe Im saying this too often, but you should
always create a carefully designed primary key, because if
you dont, youre throwing away one of the best things
InnoDB gives you: a clustered index. Read my past articles
for more on how to design an effective primary key.

Summary
The more you know about how indexes work, the more
you can optimize your databases. Sometimes these
optimizations dont help much, but sometimes theyre
huge. In this article I explained how InnoDBs primary and
secondary indexes are different from other storage
engines. Now that you understand the differences, you
can understand the optimizations and trade-offs each
storage engine has, and how to take advantage of the
optimizations and avoid the drawbacks if possible. I
showed you several side effects of the index design, such
s earc h t his webs it e
https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 10/11
3/4/2017 HowtoexploitMySQLindexoptimizationsBaronSchwartz'sBlog

as a query scanning a secondary index instead of the table,


and went into a bit of InnoDB internals to see how tables
without primary keys work.

If this article helped you, you should consider subscribing


via feeds or e-mail, because its the best way to get my
upcoming articles. I publish two or three times a week.

I'm Baron Schwartz, the founder and CEO of VividCortex. I


am the author of High Performance MySQL and lots of open-
source software for performance analysis, monitoring, and
system administration. I contribute to various database
communities such as Oracle, PostgreSQL, Redis and
MongoDB. More about me.

Newer Older

Comments

s earc h t his webs it e


https://www.xaprb.com/blog/2006/07/04/howtoexploitmysqlindexoptimizations/ 11/11

Vous aimerez peut-être aussi