mysql> show create table tt \G *************************** 1. row *************************** Table: tt Create Table: CREATE TABLE `tt` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` char(16) NOT NULL DEFAULT '', `rule_id` int(10) unsigned NOT NULL DEFAULT '0', `status` tinyint(3) unsigned NOT NULL DEFAULT '1', PRIMARY KEY (`id`), KEY `ttx` (`name`,`rule_id`) ) ENGINE=InnoDB AUTO_INCREMENT=1176504 DEFAULT CHARSET=utf8 1 row in set (0.00 sec)
目前數據庫中數據大概有2000W條,之後可能還會不斷增長,現在想要查詢的是:從表中找出相同姓名,相同規則ID,並且status = 1對應的最後一條記錄 寫個程序隨機加入數據
<?php mysql_connect('10.0.0.234','root','123456'); mysql_select_db('testdb'); mysql_query("set names utf8"); $strs =''; $start = 'insert into tt values '; for($i=0;$i<3000000;$i++){ $k = range(a,z); shuffle($k); $str = implode('',$k); $num = mt_rand(8,16); $in = substr($str,0,$num); $strs .= " (NULL,'$in',$num,1),"; if($i%10000 == 0){ echo $i."\n"; $sql = trim($start.$strs,','); mysql_query($sql); $strs =''; } }
這樣300W數據就有了 目前的查詢語句是這樣的 SELECT * FROM tt WHERE id IN ( SELECT max(id) AS max_id FROM tt GROUP BY name, rule_id) AND status = 1 看一下查詢計劃 mysql> explain SELECT * FROM tt WHERE id IN ( SELECT max(id) AS max_id FROM tt GROUP BY name, rule_id) and status=1; +----+--------------------+-------+-------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+-------+---------------+------+---------+------+---------+-------------+ | 1 | PRIMARY | tt | ALL | NULL | NULL | NULL | NULL | 1176818 | Using where | | 2 | DEPENDENT SUBQUERY | tt | index | NULL | ttx | 52 | NULL | 1 | Using index | +----+--------------------+-------+-------+---------------+------+---------+------+---------+-------------+ 索引用上了,也沒有filesort,這是不是就很快了,然後執行查詢,就卡主了,卡主了,最後只能被Ctrl+C了 那麼這個不行,就換一個,這裡我想到了,子查詢換成join看一看效果 SELECT a.id,b.name,b.rule_id FROM (select max(id) as id from tt group by name, rule_id) as a left join tt b on a.id = b.id 看一下查詢計劃 mysql> explain SELECT a.id,b.name,b.rule_id FROM (select max(id) as id from tt group by name, rule_id) as a left join tt b on a.id = b.id; +----+-------------+------------+--------+---------------+---------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+---------+---------+------+---------+-------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1176503 | | | 1 | PRIMARY | b | eq_ref | PRIMARY | PRIMARY | 4 | a.id | 1 | | | 2 | DERIVED | tt | index | NULL | ttx | 52 | NULL | 1176818 | Using index | +----+-------------+------------+--------+---------------+---------+---------+------+---------+-------------+ 執行結果 1.77579775 sec mysql> show profiles; +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ | 6 | 1.77579775 | SELECT a.id,b.name,b.rule_id FROM (select max(id) as id from tt group by name, rule_id) as a left join tt b on a.id = b.id | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ 7 rows in set (0.00 sec) 看一看執行 過程
mysql> show profile for query 6; +------------------------------+----------+ | Status | Duration | +------------------------------+----------+ | Sending data | 0.000596 | | Waiting for query cache lock | 0.000002 | | Sending data | 0.000579 | | Waiting for query cache lock | 0.000002 | | Sending data | 0.000534 | | Waiting for query cache lock | 0.000002 | | Sending data | 1.101490 | | end | 0.000010 | | query end | 0.000004 | | closing tables | 0.000003 | | removing tmp table | 0.001369 | | closing tables | 0.000010 | | freeing items | 0.000024 | | logging slow query | 0.000002 | | logging slow query | 0.000035 | | cleaning up | 0.000003 | +------------------------------+----------+ 100 rows in set (0.00 sec)
太多的查詢緩存lock,關掉它,再看一下查詢結果
mysql> show profile for query 10; +---------------------------+----------+ | Status | Duration | +---------------------------+----------+ | starting | 0.000154 | | checking permissions | 0.000006 | | checking permissions | 0.000005 | | Opening tables | 0.000124 | | System lock | 0.000122 | | optimizing | 0.000007 | | statistics | 0.000035 | | preparing | 0.000018 | | executing | 0.000012 | | Sorting result | 0.000006 | | Sending data | 0.518897 | | converting HEAP to MyISAM | 0.070147 | | Sending data | 0.067123 | | init | 0.000033 | | optimizing | 0.000003 | | statistics | 0.000020 | | preparing | 0.000009 | | executing | 0.000001 | | Sending data | 1.193679 | | end | 0.000011 | | query end | 0.000010 | | closing tables | 0.000002 | | removing tmp table | 0.001491 | | closing tables | 0.000011 | | freeing items | 0.000020 | | logging slow query | 0.000002 | | logging slow query | 0.000050 | | cleaning up | 0.000003 | +---------------------------+----------+
再次分析,我們發現 converting HEAP to MyISAM 這個很耗時
mysql> select @@max_heap_table_size/1024/1024; +---------------------------------+ | @@max_heap_table_size/1024/1024 | +---------------------------------+ | 16.00000000 | +---------------------------------+ mysql> set max_heap_table_size = 16777216*4; Query OK, 0 rows affected (0.00 sec) mysql> select @@max_heap_table_size/1024/1024; +---------------------------------+ | @@max_heap_table_size/1024/1024 | +---------------------------------+ | 64.00000000 | +---------------------------------+
再看看執行結果 1.77579775 > 1.68962725 ,還是有效果的 mysql> show profiles; +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ | 17 | 1.68962725 | SELECT a.id,b.name,b.rule_id FROM (select max(id) as id from tt group by name, rule_id) as a left join tt b on a.id = b.id | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ 分析
mysql> show profile for query 17; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000120 | | checking permissions | 0.000004 | | checking permissions | 0.000003 | | Opening tables | 0.000106 | | System lock | 0.000113 | | optimizing | 0.000007 | | statistics | 0.000044 | | preparing | 0.000011 | | executing | 0.000006 | | Sorting result | 0.000002 | | Sending data | 0.567858 | | init | 0.000032 | | optimizing | 0.000004 | | statistics | 0.000017 | | preparing | 0.000015 | | executing | 0.000002 | | Sending data | 1.120159 | | end | 0.000011 | | query end | 0.000005 | | closing tables | 0.000002 | | removing tmp table | 0.001020 | | closing tables | 0.000011 | | freeing items | 0.000018 | | logging slow query | 0.000002 | | logging slow query | 0.000056 | | cleaning up | 0.000004 | +----------------------+----------+
好看多了,耗時的地方都在 Sending data 上了,如果硬盤換成PCI-SSD 估計又能提高不少 還有沒有其他方法呢,當然有,那麼換一種寫法 先看查詢計劃 mysql> explain select max(concat_ws(' ',lpad(id,5,' '),status)) as res, name, rule_id from tt group by name, rule_id ; +----+-------------+-------+-------+---------------+------+---------+------+---------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+---------+-------+ | 1 | SIMPLE | tt | index | NULL | ttx | 52 | NULL | 1176818 | | +----+-------------+-------+-------+---------------+------+---------+------+---------+-------+ 再看看執行結果 mysql> show profiles; +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ | 22 | 1.82505025 | select max(concat_ws(' ',lpad(id,5,' '),status)) as res, name, rule_id from tt group by name, rule_id | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------+ 好像更慢了 1.82505025 > 1.68962725 分析一下
mysql> show profile for query 22; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000157 | | checking permissions | 0.000007 | | Opening tables | 0.000019 | | System lock | 0.000020 | | init | 0.000032 | | optimizing | 0.000005 | | statistics | 0.000016 | | preparing | 0.000012 | | executing | 0.000008 | | Sorting result | 0.000003 | | Sending data | 1.824677 | | end | 0.000012 | | query end | 0.000005 | | closing tables | 0.000009 | | freeing items | 0.000016 | | logging slow query | 0.000002 | | logging slow query | 0.000049 | | cleaning up | 0.000004 | +----------------------+----------+
時間基本上都花費在send data了 這裡說一下,max + group by 完全就是一個坑,如果有多個字段要返回數據不能這樣寫 select max(id) ,type ,name from table where type=xx group by name 因為group by默認返回第一條記錄 如果像下面那樣寫,肯定會遇到坑的 select max(id) as res, name, rule_id from tt group by name, rule_id ; 這樣也可以,而且快多了,但是如果有其他字段怎麼辦 這樣的語句基本上沒有什麼可以再優化了,只能換換其他方式了,比如:換SSD+raid10 ,分區/分表/分庫+中間件了