MYSQL的Range Optimization主要還是基於索引。
The range
access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections give descriptions of conditions under which the optimizer uses range access.
For a single-part index, index value intervals can be conveniently represented by corresponding conditions in theWHERE
clause, so we speak of range conditions rather than “intervals.”
The definition of a range condition for a single-part index is as follows:
For both BTREE
and HASH
indexes, comparison of a key part with a constant value is a range condition when using the =
, <=>
, IN()
, IS NULL
, or IS NOT NULL
operators.
Additionally, for BTREE
indexes, comparison of a key part with a constant value is a range condition when using the >
, <
, >=
, <=
, BETWEEN
, !=
, or <>
operators, or LIKE
comparisons if the argument to LIKE
is a constant string that does not start with a wildcard character.
For all types of indexes, multiple range conditions combined with OR
or AND
form a range condition.
對於BTREE索引和HASH索引來說,索引的范圍優化基本上只適用於等值查詢。譬如=, <=>, IN(), IS NULL, IS NOT NULL操作符。
相對於HASH索引,BTREE索引同樣支持非等值查詢,譬如>, <, >=, <=, BETWEEN, !=, <>和LIKE(注意,like的常量值不能以通配符開頭)
“Constant value” in the preceding descriptions means one of the following:
A constant from the query string
A column of a const
or system
table from the same join
The result of an uncorrelated subquery
Any expression composed entirely from subexpressions of the preceding types
常量值一般指三種:查詢條件為常量,const表或system表,非關聯子查詢的結果
其中,const表指的是最多只有一個匹配行,譬如基於主鍵的查詢:
SELECT * FROM tbl_name WHERE primary_key=1; SELECT * FROM tbl_name WHERE primary_key_part1=1 AND primary_key_part2=2;
Here are some examples of queries with range conditions in the WHERE
clause:
SELECT * FROM t1 WHEREkey_col
> 1 ANDkey_col
< 10; SELECT * FROM t1 WHEREkey_col
= 1 ORkey_col
IN (15,18,20); SELECT * FROM t1 WHEREkey_col
LIKE 'ab%' ORkey_col
BETWEEN 'bar' AND 'foo';
Some nonconstant values may be converted to constants during the constant propagation phase.
以下是MySQL提取范圍條件的思路,並不是等價變換,目的還是在於盡可能的使用索引的范圍查詢,包括後面的紅色部分也提到,這樣變換後的條件會沒有原來的條件嚴格,MySQL這樣做的目的在於利用索引過濾掉很大一部分記錄,然後再對剩下的記錄進行額外的篩選。
MySQL tries to extract range conditions from the WHERE
clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.
Consider the following statement, where key1
is an indexed column and nonkey
is not indexed:
SELECT * FROM t1 WHERE (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR (key1 < 'bar' AND nonkey = 4) OR (key1 < 'uux' AND key1 > 'z');
The extraction process for key key1
is as follows:
Start with original WHERE
clause:
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR (key1 < 'bar' AND nonkey = 4) OR (key1 < 'uux' AND key1 > 'z')
Remove nonkey = 4
and key1 LIKE '%b'
because they cannot be used for a range scan. The correct way to remove them is to replace them with TRUE
, so that we do not miss any matching rows when doing the range scan. Having replaced them with TRUE
, we get:
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR (key1 < 'bar' AND TRUE) OR (key1 < 'uux' AND key1 > 'z')
Collapse conditions that are always true or false:
(key1 LIKE 'abcde%' OR TRUE)
is always true
(key1 < 'uux' AND key1 > 'z')
is always false
Replacing these conditions with constants, we get:
(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)
Removing unnecessary TRUE
and FALSE
constants, we obtain:
(key1 < 'abc') OR (key1 < 'bar')
Combining overlapping intervals into one yields the final condition to be used for the range scan:
(key1 < 'bar')
In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the WHERE
clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full WHERE
clause.
The range condition extraction algorithm can handle nested AND
/OR
constructs of arbitrary depth, and its output does not depend on the order in which conditions appear in WHERE
clause.
MySQL does not support merging multiple ranges for the range
access method for spatial indexes. To work around this limitation, you can use a UNION
with identical SELECT
statements, except that you put each spatial predicate in a different SELECT
.
Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.
For example, consider a multiple-part index defined as key1(
, and the following set of key tuples listed in key order:key_part1
, key_part2
, key_part3
)
key_part1
key_part2
key_part3
NULL 1 'abc' NULL 1 'xyz' NULL 2 'foo' 1 1 'abc' 1 1 'xyz' 1 2 'abc' 2 1 'aaa'
The condition
defines this interval:key_part1
= 1
(1,-inf,-inf) <= (key_part1
,key_part2
,key_part3
) < (1,+inf,+inf)
The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.
By contrast, the condition
does not define a single interval and cannot be used by the range access method.key_part3
= 'abc'
The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.
For HASH
indexes, each interval containing identical values can be used. This means that the interval can be produced only for conditions in the following form:
key_part1
cmp
const1
ANDkey_part2
cmp
const2
AND ... ANDkey_partN
cmp
constN
;
Here, const1
, const2
, … are constants, cmp
is one of the =
, <=>
, or IS NULL
comparison operators, and the conditions cover all index parts. (That is, there are N
conditions, one for each part of an N
-part index.) For example, the following is a range condition for a three-part HASH
index:
key_part1
= 1 ANDkey_part2
IS NULL ANDkey_part3
= 'foo'
For the definition of what is considered to be a constant, see Section 8.2.1.3.1, “The Range Access Method for Single-Part Indexes”.
For a BTREE
index, an interval might be usable for conditions combined with AND
, where each condition compares a key part with a constant value using =
, <=>
, IS NULL
, >
, <
, >=
, <=
, !=
, <>
, BETWEEN
, or LIKE '
(where pattern
''
does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if pattern
'<>
or !=
is used).
The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =
, <=>
, or IS NULL
. If the operator is >
, <
, >=
, <=
, !=
, <>
, BETWEEN
, or LIKE
, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses =
from the first comparison. It also uses >=
from the second comparison but considers no further key parts and does not use the third comparison for interval construction:
key_part1
= 'foo' ANDkey_part2
>= 10 ANDkey_part3
> 10
The single interval is:
('foo',10,-inf) < (key_part1
,key_part2
,key_part3
) < ('foo',+inf,+inf)
It is possible that the created interval contains more rows than the initial condition. For example, the preceding interval includes the value ('foo', 11, 0)
, which does not satisfy the original condition.
If conditions that cover sets of rows contained within intervals are combined with OR
, they form a condition that covers a set of rows contained within the union of their intervals. If the conditions are combined withAND
, they form a condition that covers a set of rows contained within the intersection of their intervals. For example, for this condition on a two-part index:
(key_part1
= 1 ANDkey_part2
< 2) OR (key_part1
> 5)
The intervals are:
(1,-inf) < (key_part1
,key_part2
) < (1,2) (5,-inf) < (key_part1
,key_part2
)
In this example, the interval on the first line uses one key part for the left bound and two key parts for the right bound. The interval on the second line uses only one key part. The key_len
column in the EXPLAIN
output indicates the maximum length of the key prefix used.
In some cases, key_len
may indicate that a key part was used, but that might be not what you would expect. Suppose that key_part1
and key_part2
can be NULL
. Then the key_len
column displays two key part lengths for the following condition:
key_part1
>= 1 ANDkey_part2
< 2
But, in fact, the condition is converted to this:
key_part1
>= 1 ANDkey_part2
IS NOT NULL
Section 8.2.1.3.1, “The Range Access Method for Single-Part Indexes”, describes how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index. Analogous steps are performed for range conditions on multiple-part indexes.
Consider these expressions, where col_name
is an indexed column:
col_name
IN(val1
, ...,valN
)col_name
=val1
OR ... ORcol_name
=valN
Each expression is true if col_name
is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:
If there is a unique index on col_name
, the row estimate for each range is 1 because at most one row can have the given value.
Otherwise, the optimizer can estimate the row count for each range using dives into the index or index statistics.
With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression
has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.col_name
IN (10, 20, 30)
Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.
The eq_range_index_dive_limit
system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To disable use of statistics and always use index dives, set eq_range_index_dive_limit
to 0. To permit use of index dives for comparisons of up to N
equality ranges, set eq_range_index_dive_limit
to N
+ 1.
To update table index statistics for best estimates, use ANALYZE TABLE
.
As of MySQL 5.7.3, the optimizer is able to apply the range scan access method to queries of this form:
SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));
Previously, for range scans to be used it was necessary for the query to be written as:
SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' ) OR ( col_1 = 'c' AND col_2 = 'd' );
For the optimizer to use a range scan, queries must satisfy these conditions:
Only IN
predicates can be used, not NOT IN
.
There may only be column references in the row constructor on the IN
predicate's left hand side.
There must be more than one row constructor on the IN
predicate's right hand side.
Row constructors on the IN
predicate's right hand side must contain only runtime constants, which are either literals or local column references that are bound to constants during execution.
Compared to similar queries executed before MySQL 5.7.3, EXPLAIN
output for applicable queries changes from full table or index scan to range scan. Changes are also visible by checking the values of theHandler_read_first
, Handler_read_key
, and Handler_read_next
status variables.