Monday, October 21, 2013

FAQ: InnoDB extended secondary keys

MySQL 5.6 introduced a new feature called extended secondary keys. We get a lot of questions about it and find that most of them come from a few incorrect assumption. In this post I'll try to get rid of the confusion once and for all. Famous last words... here goes:

Q1: Do I need to do anything to enable extended secondary keys?

No, nothing at all. It's on by default and I can't see any sensible reason why you would want to disable it. However, it is possible to disable it by tuning the optimizer_switch: SET optimizer_switch='use_index_extensions={on|off}'.

Q2: Does extended secondary keys only work with InnoDB?

No, it should work with any storage engine that uses the primary key columns as reference to the row, which means most storage engines with clustered primary keys. I say "should" because it requires a minimum of work from the storage engine provider; it must announce to MySQL that it is supported.


Oracle ships one storage engines which uses primary key as reference to the row and which supports the extended secondary key functionality: InnoDB. The other engines, like MyISAM, do not support it.

Q3: What does it mean to use the primary key as reference to the row?

A few basics first: A clustered key means that the data rows are physically stored in the order defined by the index. Theoretically, the rows may be stored in the index itself or in a separate file. InnoDB has a clustered primary key because the rows are physically stored in the primary key. A scan of the primary key and a table scan is therefore exactly the same thing in InnoDB.

A non-clustered index does not store the rows. Instead, each entry in the index contains the value of the indexed columns and some kind of reference to the actual row. By following this reference, the storage engine can find the row wherever it is stored. The reference may, e.g., be the row's physical location.

In the case of InnoDB and most storage engines with clustered primary keys, the reference is the primary key. Each and every InnoDB index therefore stores the primary key columns. This is not new; it has always been this way.

Q4: What was done to indexes in InnoDB to enable the extension?

Nothing. The indexes have not been modified. The only thing that happened in MySQL 5.6 was that the MySQL server was made fully aware of the existence of the primary key parts. A wide range of additional optimizations became possible due to this.

Q5: Are the primary key fields unnecessarily duplicated in the secondary keys?

No, the secondary keys contain exactly what is required for them to work, and the primary key fields are required because they function as the reference to the row. If a secondary key is created in such a way that one or more primary key fields are part of the key definition, then the index will not be extended with these fields.

In other words, if you create the 'lineitem' table from DBT-3 and add a few indexes:

CREATE TABLE lineitem (
  ...
  PRIMARY KEY (orderkey, linenumber), 
  INDEX shipdt_idx (shipdate),
  INDEX qty_idx (quantity, linenumber)
  ...
)

Then shipdt_idx really consists of (shipdate, orderkey, linenumber) and qty_idx consists of (quantity, linenumber, orderkey) - in that order and without duplicating linenumber.

However, there is an exception to every rule, and in our case it is this: If a secondary key contains a prefix of a primary key column, that column is duplicated. Thus, an index created like this: "INDEX prefix_idx (pk(1), my_field)" will be stored like this: (pk1(1), my_field, pk1,...,pkx). If column prefix doesn't ring a bell then nevermind - it's not important.


Q6: So there is really no difference between a secondary key defined with and without the primary key columns in InnoDB?

That depends, because the column order in the index definition matters. If you want the primary key columns at the end of the secondary index and in the same order as defined by the primary key, then there is no practical difference between explicitly adding them or not.
 
If you need those primary key columns "out of order" or in between other columns in the secondary key, then you have no choice but to add them explicitly.

And now for the fine print:
SHOW CREATE TABLE and SHOW INDEXES will only show columns explicitly covered by the index. 


Q7: Did MySQL make use of the secondary keys extension prior to 5.6?

Yes, to some extent. Prior to MySQL 5.6 the primary key columns could be used for two things: a) index-only reads, and b) avoid sorting.

Index-only reading is when all fields that need to be read can be found in the index. MySQL does not have to access the rows in the table in such cases, and that saves IO. To make you aware of it, EXPLAIN will show "Using index" in the Extra column. Consider the query:

SELECT linenumber FROM lineitem WHERE quantity>50;

For this query, index only access was possible (in 5.5. as well) by scanning the lineitem.qty_idx index defined above.

MySQL can avoid sorting by reading through an index since the index is stored in sorted order. The query:

SELECT * FROM lineitem WHERE quantity=50 ORDER BY linenumber;

could be resolved without sorting if the lineitem.qty_idx was scanned. Again, this was available in 5.5 and earlier.

Q8:  Which new optimizations are enabled in MySQL 5.6?

In  MySQL 5.6, the primary key fields are used as if the index was explicitly created with these columns. There are no exceptions, except those that are mentioned in Q6 :-)

The most important optimizations that can now make use of the primary key columns are:

Examples, please!

mysql> USE dbt3;

mysql> SHOW CREATE TABLE lineitem;
CREATE TABLE `lineitem` (
  `l_orderkey` int(11) NOT NULL DEFAULT '0',
  `l_suppkey` int(11) DEFAULT NULL,
  `l_linenumber` int(11) NOT NULL DEFAULT '0',
  `l_quantity` decimal(10,2) DEFAULT NULL,
  `l_shipDATE` date DEFAULT NULL,
  `l_shipDATE` date DEFAULT NULL,
  ... (cut for readability)
  PRIMARY KEY (`l_orderkey`,`l_linenumber`),
  KEY `i_l_shipdate` (`l_shipDATE`),
  ... (cut for readability)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> ##################
mysql> # MAX optimization
mysql> ##################

mysql> SET optimizer_switch='use_index_extensions=on';
mysql> FLUSH STATUS;
mysql> SELECT MAX(l_orderkey) FROM lineitem WHERE l_shipdate='1997-06-01';
+-----------------+
| MAX(l_orderkey) |
+-----------------+
|         5998529 |
+-----------------+
1 row in set (0.00 sec)

mysql> # handler_read calls show that lookup was performed
mysql> SHOW STATUS LIKE 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 2     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.04 sec)

mysql> SET optimizer_switch='use_index_extensions=off';
mysql> FLUSH STATUS;
mysql> SELECT MAX(l_orderkey) FROM lineitem WHERE l_shipdate='1997-06-01';
...
1 row in set (0.05 sec)

mysql> # handler_read calls show that index scan was performed
mysql> SHOW STATUS LIKE 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 2707  |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.05 sec)

mysql> ####################
mysql> # RANGE optimization
mysql> ####################

mysql> SET optimizer_switch='use_index_extensions=on';
mysql> FLUSH STATUS;
mysql> SELECT COUNT(*) FROM lineitem WHERE l_shipdate='1997-06-01' AND l_orderkey>5000000;
+----------+
| COUNT(*) |
+----------+
|      488 |
+----------+
1 row in set (0.00 sec)

mysql> SHOW STATUS LIKE 'handler_read_next';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_read_next | 488   |
+-------------------+-------+
1 row in set (0.05 sec)

mysql> SET optimizer_switch='use_index_extensions=off';
mysql> FLUSH STATUS;
mysql> SELECT COUNT(*) FROM lineitem WHERE l_shipdate='1997-06-01' AND l_orderkey>5000000;
...
1 row in set (0.02 sec)

mysql> SHOW STATUS LIKE 'handler_read_next';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_read_next | 2707  |
+-------------------+-------+
1 row in set (0.04 sec)


4 comments:

  1. Any plans to support extended keys for partitioned tables http://bugs.mysql.com/bug.php?id=70823 ?

    ReplyDelete
    Replies
    1. I remember we discussed this back when extended keys were implemented in the first place. An internal feature request was filed back then, but it seems that it didn't get the attention it deserved. I've notified the implementor and I hope that it will be addressed rather soon. Thanks for the bug report BTW.

      Delete
  2. Dear Jorgen, why secondary index store the primary key instead of the link to record?? It seems overhead to take a second lookup in the primary index. And I think the position of the record is seldom changed so I think storing the link to the location is better. Is that because the primary key is much smaller than the link number to the location?

    ReplyDelete
  3. It would be possible to store something like a pointer to a disk block and offset within the block, but that's not necessarily smaller than the currently used primary key fields. If the primary key for the table is a single integer, it will actually be shorter or equally long as block pointers, depending on the integer type. Besides, your assumption that the location seldom changes is only correct for some very specific workloads. In practice, you would end up with another indirection ("believed disk location" -> "actual disk location").

    Disclaimer: I didn't implement or design InnoDB, so you may be better off asking someone on that team if you need more details.

    ReplyDelete