项目场景
项目上的ETL模型里有如下SQL语句。执行速度非常慢,每次只查询200条数据,但却需要20多秒的时间。再加上该SQL查询出的数据同步频率很高,这个速度是完全不能忍受的。
因为项目隐私,所以对表及字段做了改写。
SELECT
IDO.OD_SN
FROM IDO
LEFT JOIN IMO ON IMO.OD_SN= IDO.OD_SN
WHERE IMO.OD_TYPE IN ('X','Y')
AND IMO.SOURCE_ID IS NULL
AND IMO.MODIFY_TIME >= '20240423000000'
AND (IMO.YZ = 'N' OR IDO.YZ = 'N')
AND ROWNUM <= 200
IMO表的数据量:18134780行
IDO表的数据量:2908979行
上述SQL的结果集数量也很明显:200行
问题分析
上面的SQL对于我等凡人来说,没办法一眼看出哪里有问题;所以还是需要拉一下执行计划(获取执行计划方法文章链接:获取执行计划)。
下面是问题SQL的执行计划,是已经将无关的信息删除。这里是获取的内存中shard_pool的执行计划,是真实的执行计划。
-
Plan hash value: 1275918432 -------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 2051 (100)| | 100 |00:00:29.08 | 4057K| 1 | |* 1 | COUNT STOPKEY | | 1 | | | | | 100 |00:00:29.08 | 4057K| 1 | | 2 | NESTED LOOPS | | 1 | 102 | 6936 | 2051 (1)| 00:00:01 | 100 |00:00:29.08 | 4057K| 1 | | 3 | NESTED LOOPS | | 1 | 102 | 6936 | 2051 (1)| 00:00:01 | 100 |00:00:29.08 | 4056K| 1 | |* 4 | TABLE ACCESS FULL | IMO | 1 | 21724 | 954K| 1846 (1)| 00:00:01 | 100 |00:00:29.08 | 4056K| 0 | |* 5 | INDEX UNIQUE SCAN | UK_20230901211220_1065023 | 100 | 1 | | 1 (0)| 00:00:01 | 100 |00:00:00.01 | 202 | 1 | |* 6 | TABLE ACCESS BY INDEX ROWID| IDO | 100 | 1 | 23 | 2 (0)| 00:00:01 | 100 |00:00:00.01 | 100 | 0 | -------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=100) 4 - filter(("IMO"."MODIFY_TIME">='20240401000000' AND INTERNAL_FUNCTION("IMO"."OD_TYPE") AND "IMO"."SOURCE_ID" IS NULL)) 5 - access("IMO"."OD_SN"="IDO"."OD_SN") 6 - filter((DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00279$",5)),NULL,NVL("IMO"."YZ",'N'),'0',NVL("IMO"."YZ",'N'),'1',"IMO"." YZ")='N' OR DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00155$",0)),NULL,NVL("IDO"."YZ",'N'),'0',NVL("IDO"."YZ",'N'),'1' ,"IDO"."YZ")='N'))
从上面的执行计划中,我们可以逐步分析:
- 连接方式 (id=2、id=3)
IDO表和IMO表的连接方式是 NESTED LOOPS ,因为只返回少量数据(200行),所以走嵌套循环连接完全没问题。
- 访问路径(id=4)
IMO表的访问路径是TABLE ACCESS FULL;上文已经提过,IMO表有18134780行数据,走全表扫描,还是返回少量数据;走全表扫描肯定是错误的
- 访问路径(id=6)
IMO表的访问路径是TABLE ACCESS BY INDEX ROWID,索引ROWID扫描,没有问题。
- 谓词信息
从谓词信息或者SQL语句中,我们可以发现IMO表中的MODIFY_TIME、SOURCE_ID、OD_TYPE字段中发生了谓词过滤。
从上面的信息,我们可以得出以下优化结论了:
Ⅰ:需要让IMO表走索引扫描;
Ⅱ:可以在IMO表上建立MODIFY_TIME、SOURCE_ID、OD_TYPE三个字段的组合索引;其中MODIFY_TIME的选择性最大,OD_TYPE的选择性其次,SOURCE_ID的选择性最差。所以选择MODIFY_TIME作为组合索引的先导列。
优化方案
创建组合索引
CREATE INDEX idx_mtime_type_source ON IMO (MODIFY_TIME,OD_TYPE,SOURCE_ID) ONLINE;
再次执行SQL,发现只需要0.1秒就可以执行完成。
我们此时再来看下执行计划,发现IMO表已经走了索引扫描;组合索引已经起到效果。
Plan hash value: 1019133023
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 19528 (100)| | 100 |00:00:00.02 | 401 | 49 |
|* 1 | COUNT STOPKEY | | 1 | | | | | 100 |00:00:00.02 | 401 | 49 |
| 2 | VIEW | VW_ORE_7D109085 | 1 | 103 | 2060 | 19528 (1)| 00:00:01 | 100 |00:00:00.02 | 401 | 49 |
| 3 | UNION-ALL | | 1 | | | | | 100 |00:00:00.02 | 401 | 49 |
| 4 | NESTED LOOPS | | 1 | 102 | 6732 | 195 (0)| 00:00:01 | 100 |00:00:00.02 | 401 | 49 |
|* 5 | TABLE ACCESS BY INDEX ROWID BATCHED | IMO | 1 | 21723 | 954K| 93 (0)| 00:00:01 | 100 |00:00:00.01 | 203 | 36 |
|* 6 | INDEX RANGE SCAN | IDX_MTIME_TYPE_SOURCE | 1 | 21744 | | 5 (0)| 00:00:01 | 102 |00:00:00.01 | 10 | 9 |
|* 7 | INDEX UNIQUE SCAN | UK_20230901211220_1065023 | 100 | 1 | 21 | 1 (0)| 00:00:01 | 100 |00:00:00.01 | 198 | 13 |
| 8 | NESTED LOOPS | | 0 | 1 | 68 | 19332 (1)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 |
| 9 | NESTED LOOPS | | 0 | 1 | 68 | 19332 (1)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 |
|* 10 | TABLE ACCESS BY INDEX ROWID BATCHED| IMO | 0 | 1 | 45 | 19330 (1)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 |
|* 11 | INDEX RANGE SCAN | IDX_MTIME_TYPE_SOURCE | 0 | 21744 | | 608 (1)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 |
|* 12 | INDEX UNIQUE SCAN | UK_20230901211220_1065023 | 0 | 1 | | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 |
|* 13 | TABLE ACCESS BY INDEX ROWID | IDO | 0 | 1 | 23 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=100)
5 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00279$",5)),NULL,NVL("IMO"."YZ",'N'),'0',NVL("IMO"."YZ",'N'),'1',"IMO"."YZ
")='N')
6 - access("IMO"."MODIFY_TIME">='20240401000000' AND "IMO"."SOURCE_ID" IS NULL AND "IMO"."MODIFY_TIME" IS NOT NULL)
filter((INTERNAL_FUNCTION("IMO"."OD_TYPE") AND "IMO"."SOURCE_ID" IS NULL))
7 - access("IMO"."OD_SN"="IDO"."OD_SN")
10 - filter(LNNVL(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00279$",5)),NULL,NVL("IMO"."YZ",'N'),'0',NVL("IMO"."YZ",'N'),'1',"IMO"."YZ
")='N'))
11 - access("IMO"."MODIFY_TIME">='20240401000000' AND "IMO"."SOURCE_ID" IS NULL AND "IMO"."MODIFY_TIME" IS NOT NULL)
filter((INTERNAL_FUNCTION("IMO"."OD_TYPE") AND "IMO"."SOURCE_ID" IS NULL))
12 - access("IMO"."OD_SN"="IDO"."OD_SN")
13 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00155$",0)),NULL,NVL("IDO"."YZ",'N'),'0',NVL("IDO"."YZ",'N'),'1',"IDO"."YZ
")='N')