SQL优化选对执行计划,查询速度提升1000倍 | OceanBase 应用实践

作者:爱可生数据库高级工程师任仲禹,擅长故障分析和性能优化。

本文通过一个案例,分享使用OceanBase时,SQL走错执行计划,而导致慢SQL的排查方法论。

案例背景

在使用OceanBase 3.2.3 版本的过程中,项目组反映某个 SELECT 语句在指定时间内的查询响应速度异常缓慢,其耗时远超正常情况的1000倍以上。具体细节如下:

  • 慢 SELECT
SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31' ;
  • 关键表结构、记录数信息如下
-- 脱敏处理
show create table renzy\G                  
*************************** 1. row ***************************
       Table: renzy
Create Table: CREATE TABLE `renzy` (
  `ID` char(18) COLLATE utf8mb4_bin NOT NULL COMMENT ,
...
  `ACCT_NO` char(40) COLLATE utf8mb4_bin NOT NULL COMMENT ,
...
  `ACCTG_DT` date DEFAULT NULL COMMENT ,
...
  PRIMARY KEY (`ID`),
...
  KEY `renzy_I2` (`ACCT_NO`) BLOCK_SIZE 16384 LOCAL,
...
  KEY `renzy_I5` (`ACCTG_DT`, `ENQ_INST_CD`, `BLON_INST_CD`, `EMRG_STPY_SRC_CD`) BLOCK_SIZE 16384 LOCAL,
...
) DEFAULT CHARSET = utf8mb4;
1 row in set (0.01 sec)

MySQL > SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882');
+---------+
| TOT_CNT |
+---------+
|       1 |
+---------+
1 row in set (0.02 sec)

MySQL > SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31';
+----------+
| TOT_CNT  |
+----------+
| 25432155 |
+----------+
1 row in set (12.42 sec)

MySQL > SELECT COUNT(*) AS TOT_CNT FROM renzy;                                                            
+----------+
| TOT_CNT  |
+----------+
| 25435024 |
+----------+
1 row in set (10.65 sec)

排查过程

正常执行不慢

MySQL > select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31') as orginal limit 2000; select last_trace_id();
+---------+
| TOT_CNT |
+---------+
|       1 |
+---------+
1 row in set (0.02 sec)

以下是执行计划,从中可见,索引I2是最高效的选择,它在进行等值匹配时仅需要执行一次回表操作。

*************************** 1. row ***************************
Query Plan: ========================================================
|ID|OPERATOR        |NAME               |EST. ROWS|COST|
--------------------------------------------------------
|0 |LIMIT           |                   |1        |92  |
|1 | SCALAR GROUP BY|                   |1        |92  |
|2 |  TABLE SCAN    |renzy(renzy_I2)|1        |92  |
========================================================
...
Outline Data:
-------------------------------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$2" "gabsdb.renzy"@"SEL$2" "renzy_I2")
      END_OUTLINE_DATA
  */
...
renzy:table_rows:25419080, physical_range_rows:1, logical_range_rows:1, index_back_rows:1, output_rows:0, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[renzy_I2,renzy_I5], pruned_index_name[renzy_I1,renzy_I3,renzy_I4,renzy_I6], unstable_index_name[renzy], estimation info[table_id:1105009185965290, (table_type:1, version:0-1699898410195654-1699898410195654, logical_rc:1, physical_rc:1), (table_type:7, version:1699898401860480-1699898401860480-1699898433101378, logical_rc:0, physical_rc:0), (table_type:7, version:1699898433101378-1699904137032515-1699905915658079, logical_rc:0, physical_rc:0), (table_type:5, version:1699898433101378-1699904137032515-1699905915658079, logical_rc:0, physical_rc:0), (table_type:0, version:1699905915658079-1699905915658079-9223372036854775807, logical_rc:0, physical_rc:0)]
...

通过 OCP 的 SQL 诊断获取慢 SQL 的 plan_id,检查慢 SQL 实际命中的 plan。

MySQL [oceanbase]> select * from gv$plan_cache_plan_stat where plan_id=7288229 \G                                                                         
*************************** 1. row ***************************
...
              plan_id: 7288229
...
            statement: select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = ?) AND LAWENF_NTIST_TP_CD NOT LIKE ? AND LAWENF_NTIST_TP_CD NOT LIKE ? AND EMRG_STPY_SRC_CD != ? AND ACCTG_DT >= ? AND ACCTG_DT <= ?) as orginal limit 2000
            query_sql: select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '' AND ACCTG_DT <= '') as orginal limit 2000
       special_params: 2000
          param_infos: {1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,22},{1,0,0,-1,17},{1,0,0,-1,17}
             sys_vars: 45,45,12582912,2,4,1,0,0,32,3,1,0,1,1,0,10485760,1,1,0,1,BINARY,BINARY,AL32UTF8,AL16UTF16,BYTE,FALSE,1,100,64,200,0,13,NULL,1,1,1,1
            plan_hash: 10428103352368081688
      first_load_time: 2023-11-14 10:14:11.578250
       schema_version: 1699927892190832
       merged_version: 287
     last_active_time: 2023-11-14 11:04:58.127020
         avg_exe_usec: 35858760
     slowest_exe_time: 2023-11-14 11:04:58.127020
     slowest_exe_usec: 171575101
           slow_count: 2
            hit_count: 7
            plan_size: 81984
           executions: 8
           disk_reads: 1136285
        direct_writes: 0
          buffer_gets: 18067948
application_wait_time: 0
concurrency_wait_time: 0
    user_io_wait_time: 0
       rows_processed: 8
         elapsed_time: 286870087
             cpu_time: 229807460
         large_querys: 2
 delayed_large_querys: 1
    delayed_px_querys: 0
      outline_version: 0
           outline_id: -1
         outline_data: /*+ BEGIN_OUTLINE_DATA INDEX(@"SEL$2" "gabsdb.renzy"@"SEL$2" "renzy_I5") END_OUTLINE_DATA*/
....
1 row in set (0.09 sec)

MySQL [oceanbase]> select * from oceanbase.gv$plan_cache_plan_explain where tenant_id=1005 and port=2882 and plan_id=7288229 and ip='12.240.26.70'\G
....
PLAN_LINE_ID: 2
    OPERATOR:   PHY_TABLE_SCAN
        NAME: renzy(renzy_I5)
        ROWS: 0
        COST: 91
    PROPERTY: table_rows:25419080, physical_range_rows:1, logical_range_rows:1, index_back_rows:0, output_rows:0, est_method:local_storage, avaiable_index_name[renzy_I2,renzy_I5]
...

上述结果的关键信息如下

1.query_sql :为该plan第一次执行时的SQL语句。

2.first_load_time :缓存该plan并hit的时间。

3.slowest_exe_usec :该计划的最慢耗时。

4.outline_id : 是否命中了绑定的outline,-1即未命中。

5.statement :参数化后的SQL语句。

6.name : 该plan走的索引。

分析下第一次的SQL为啥要走 I5 索引

通过下面执行计划和执行耗时可知,第一次执行的语句因为字段 ACCTG_DT 检索不到数据,所以走 I5 效率最高。

MySQL > explain extended select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '' AND ACCTG_DT <= '') as orginal limit 2000\G
*************************** 1. row ***************************
Query Plan: ========================================================
|ID|OPERATOR        |NAME               |EST. ROWS|COST|
--------------------------------------------------------
|0 |LIMIT           |                   |1        |92  |
|1 | SCALAR GROUP BY|                   |1        |92  |
|2 |  TABLE SCAN    |renzy(renzy_I5)|0        |92  |
========================================================
Outline Data:
-------------------------------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$2" "gabsdb.renzy"@"SEL$2" "renzy_I5")
      END_OUTLINE_DATA
  */
renzy:table_rows:25419080, physical_range_rows:1, logical_range_rows:1, index_back_rows:0, output_rows:0, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[renzy_I2,renzy_I5], pruned_index_name[renzy_I1,renzy_I3,renzy_I4,renzy_I6], unstable_index_name[renzy]

MySQL > select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '' AND ACCTG_DT <= '') as orginal limit 2000;
+---------+
| TOT_CNT |
+---------+
|       0 |
+---------+
1 row in set, 2 warnings (0.02 sec)

分析下后续SQL为何不淘汰该plan

我们知道,SQL查询并不需要每次生成查询计划,因为这样涉及到硬解析等耗费性能的操作,所以默认每次会先查询 Plan Cache (硬解析操作包含词法/语法/语义解析,优化器统计信息查询等步骤,参考下图)。

1716357409

本案例中,后续的SQL命中该 Plan 就可以理解,因为要走 I5 索引,range太大基本为全索引扫描,所以耗时太慢。,

ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31'

什么时候淘汰这个计划呢?

// 关键代码段如下(410bp1社区版,这里的逻辑和323bp8企业版类似,企业版代码不便贴出)
if (sample_count < SLOW_QUERY_SAMPLE_SIZE) {
        // do nothing when query execution samples are not enough
      } else {
        if (stat_.cpu_time_ <= SLOW_QUERY_TIME_FOR_PLAN_EXPIRE * stat_.execute_times_) {
        // do nothing for fast query
        } else if (is_plan_unstable(sample_count, sample_exec_row_count, sample_exec_usec)) {
          set_is_expired(true);
        }
        ATOMIC_STORE(&(stat_.sample_times_), 0);
      }
    }

bool ObPhysicalPlan::is_plan_unstable(const int64_t sample_count,
                                      const int64_t sample_exec_row_count,
                                      const int64_t sample_exec_usec)
{
  bool bret = false;
  if (sample_exec_usec <= SLOW_QUERY_TIME_FOR_PLAN_EXPIRE * sample_count) {
    // sample query is fast query in the average
  } else if (OB_PHY_PLAN_LOCAL == plan_type_) {
    int64_t first_query_range_rows = ATOMIC_LOAD(&stat_.first_exec_row_count_);
    if (sample_exec_row_count <= SLOW_QUERY_ROW_COUNT_THRESOLD * sample_count) {
      // the sample query does not accesses too many rows in the average
    } else if (sample_exec_row_count / sample_count > first_query_range_rows * 10) {
      // the average sample query range row count increases great
      bret = true;
      LOG_INFO("local query plan is expired due to unstable performance",
               K(bret), K(stat_.execute_times_),
               K(first_query_range_rows), K(sample_exec_row_count), K(sample_count));
    }
  } else if ( OB_PHY_PLAN_DISTRIBUTED == plan_type_) {
    int64_t first_exec_usec = ATOMIC_LOAD(&stat_.first_exec_usec_);
    if (sample_exec_usec / sample_count > first_exec_usec * 2) {
      // the average sample query execute time increases great
      bret = true;
      LOG_INFO("distribute query plan is expired due to unstable performance",
               K(bret), K(stat_.execute_times_), K(first_exec_usec),
               K(sample_exec_usec), K(sample_count));
    }
  } else {
    // do nothing
  }
  return bret;
}

这里淘汰一个 Plan 需要满足的条件有2个:

  • sample_count < SLOW_QUERY_SAMPLE_SIZE)
  • sample_exec_row_count / sample_count > first_query_range_rows * 10

这里的 SLOW_QUERY_SAMPLE_SIZE 是常量,OB410的定义是 20;sample_count(采样次数)实质为Plan的SQL执行次数。

static const int64_t SLOW_QUERY_SAMPLE_SIZE = 20; // smaller than ObPlanStat::MAX_SCAN_STAT_SIZE

结合上下文代码来看,意思是满足如下情况就会淘汰Plan:

  • 命中该Plan的SQL执行大于等于20次。
  • (执行的SQL扫描总行数 / 执行次数) 大于  (第一次SQL执行扫描的行数 * 10)

复现以验证

1.清空 plan cache,执行业务第一次生成 Plan 的 SQL。

MySQL > alter system flush plan cache;
Query OK, 0 rows affected (0.13 sec)

MySQL > select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '' AND ACCTG_DT <= '') as orginal limit 2000;
+---------+
| TOT_CNT |
+---------+
|       0 |
+---------+
1 row in set, 2 warnings (0.02 sec)

2.执行业务 SQL,复现慢的情况。

MySQL > select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31') as orginal limit 2000;
+---------+
| TOT_CNT |
+---------+
|       1 |
+---------+
1 row in set (2 min 51.61 sec)

MySQL > select last_trace_id();
+-----------------------------------+
| last_trace_id()                   |
+-----------------------------------+
| YB420CF01A46-0006009AD91C51ED-0-0 |
+-----------------------------------+
1 row in set (0.04 sec)

MySQL > select * from oceanbase.gv$sql_audit where trace_id='YB420CF01A46-0006009AD91C51ED-0-0'\G                                                                                                         ...
               TRACE_ID: YB420CF01A46-0006009AD91C51ED-0-0
...
                 SQL_ID: 2B53F4C1C330C2C089C7518CD71D667A
              QUERY_SQL: select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31') as orginal limit 2000
...
           ELAPSED_TIME: 171575101
...
           EXECUTE_TIME: 171574843
...
MEMSTORE_READ_ROW_COUNT: 25416176
 SSSTORE_READ_ROW_COUNT: 50832349
...

这里通过 sql_audit 可以观测到重要的信息:

  • ELAPSED_TIME : 执行耗时。
  • MEMSTORE_READ_ROW_COUNT / SSSTORE_READ_ROW_COUNT : 这条SQL扫描的行数。
MySQL [oceanbase]> select * from gv$plan_cache_plan_stat where plan_id=7289113 \G                                                                         
*************************** 1. row ***************************
...
               sql_id: 2B53F4C1C330C2C089C7518CD71D667A
...
            statement: select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = ?) AND LAWENF_NTIST_TP_CD NOT LIKE ? AND LAWENF_NTIST_TP_CD NOT LIKE ? AND EMRG_STPY_SRC_CD != ? AND ACCTG_DT >= ? AND ACCTG_DT <= ?) as orginal limit 2000
            query_sql: select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '' AND ACCTG_DT <= '') as orginal limit 2000
...
           outline_id: -1
         outline_data: /*+ BEGIN_OUTLINE_DATA INDEX(@"SEL$2" "gabsdb.renzy"@"SEL$2" "renzy_I5") END_OUTLINE_DATA*/
...

通过 plan_cache_plan_stat 可看到这条SQL命中了第一次SQL执行时生成的 Plan(不符合预期)。

3.继续通过脚本执行多次。

#!/bin/bash
for i in `seq 1 30`
do
echo ">>> do  $i"
mysql -h12.240.68.36 -P3306 -uroot@tgabsua2g00#obcdcbsuat01 -pOceanBase_123# -Dgabsdb -A -c -NBe "select now();select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31') as orginal limit 2000; select last_trace_id();select now();"
done

# ./1.sh
...
>>> do  17       # 耗时 2分钟27s 命中
2023-11-14 16:05:36
1
YB420CF01A46-0006009B016C66EE-0-0
2023-11-14 16:08:03
>>> do  18       # 耗时 2min12s  命中
2023-11-14 16:08:03
1
YB420CF01A46-0006009AFF8FF46D-0-0
2023-11-14 16:10:15
>>> do  19       # 耗时 2min36s  命中
2023-11-14 16:10:15
1
YB420CF01A46-0006009B012FF1D0-0-0
2023-11-14 16:12:51
>>> do  20        # 耗时 1s内     未命中,恢复正常
2023-11-14 16:12:51
1
YB420CF01A46-0006009AFEBDA7C6-0-0
2023-11-14 16:12:51
>>> do  21
2023-11-14 16:12:51
1
YB420CF01A46-0006009B016F1561-0-0
2023-11-14 16:12:52
...

可以观察到,命中该 Plan 的SQL 执行次数大于 20 次(含手工执行)后,该"不符合预期的" Plan 被淘汰。

4. 再次执行的SQL的 sql_audit 和 plan_cache_plan_stat,可看到重新生成了 Plan。

MySQL > select * from oceanbase.gv$sql_audit where trace_id='YB420CF01A46-0006009AFEBDA7C6-0-0'\G                                                                                                         
                 SQL_ID: 2B53F4C1C330C2C089C7518CD71D667A
              QUERY_SQL: select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31') as orginal limit 2000
...
          ELAPSED_TIME: 207
          PLAN_ID: 7334178
...
MEMSTORE_READ_ROW_COUNT: 1
 SSSTORE_READ_ROW_COUNT: 2               

MySQL [oceanbase]> select * from gv$plan_cache_plan_stat where plan_id=7334178 \G       
*************************** 1. row ***************************
...
              plan_id: 7334178
               sql_id: 2B53F4C1C330C2C089C7518CD71D667A
...
            statement: select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = ?) AND LAWENF_NTIST_TP_CD NOT LIKE ? AND LAWENF_NTIST_TP_CD NOT LIKE ? AND EMRG_STPY_SRC_CD != ? AND ACCTG_DT >= ? AND ACCTG_DT <= ?) as orginal limit 2000
            query_sql: select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31') as orginal limit 2000
...
      first_load_time: 2023-11-14 16:12:51.547434
...
     slowest_exe_time: 2023-11-14 16:12:51.547618
     slowest_exe_usec: 4139
...
         elapsed_time: 8279
...
           outline_id: -1
         outline_data: /*+ BEGIN_OUTLINE_DATA INDEX(@"SEL$2" "gabsdb.renzy"@"SEL$2" "renzy_I2") END_OUTLINE_DATA*/
...

5.obs日志关键信息

#grep YB420CF01A46-0006009B012FF1D0-0-0 observer.log.20231114161*|less
observer.log.20231114161017:[2023-11-14 16:10:15.813150] INFO  [SQL] ob_sql.cpp:1769 [86881][0][YB420CF01A46-0006009B012FF1D0-0-0] [lt=17] [dc=0] It is a large query, need delay, do not need disconnect(avg_process_time=123860984, exec_cnt=20, large_query_threshold=5000000, plan->get_plan_id()=7328133, ret=-4023)
observer.log.20231114161017:[2023-11-14 16:10:15.813208] TRACE [TRACE]obmp_base.cpp:156 [86881][0][YB420CF01A46-0006009B012FF1D0-0-0] [lt=18] [dc=0] [packet retry query](TRACE=begin_ts=1699949415813080 2023-11-14 08:10:15.813080|[start_sql] u=0 addr:{ip:"12.241.29.28", port:16606}|[process_begin] u=0 addr:{ip:"12.241.29.28", port:16606}, in_queue_time:13, receive_ts:1699949415813066, enqueue_ts:1699949415813067, trace_id:YB420CF01A46-0006009B012FF1D0-0-0|[session] u=3 sid:3221784053, tenant_id:1005|[parse_begin] u=10 stmt:"select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31') as orginal limit 2000", stmt_len:287|[process_end] u=85 run_ts:1699949415813082|total_timeu=98)
observer.log.20231114161302:[2023-11-14 16:12:51.412696] INFO  [SQL.ENG] ob_physical_plan.cpp:736 [86881][0][YB420CF01A46-0006009B012FF1D0-0-0] [lt=15] [dc=0] local query plan is expired due to unstable performance(bret=true, stat_.execute_times_=21, first_query_range_rows=0, sample_exec_row_count=1525906500, sample_count=20)
observer.log.20231114161302:[2023-11-14 16:12:51.412725] WARN  [SHARE.SCHEMA] revert (ob_schema_mgr_cache.cpp:131) [86881][0][YB420CF01A46-0006009B012FF1D0-0-0] [lt=11] [dc=0] long time to hold one guard(schema_mgr=0x7ee87934c610, tenant_id=1, version=1697523399752200, cur_timestamp=1699949571412714, ref_timestamp=1699949415812628, lbt()="0xf51231f 0x6158f04 0x4f5992c 0x50c61cc 0x4ed2f6f 0x4ecf518 0x4ecc8ef 0x4ecaa6e 0xb8c71f1 0x4ec9c90 0xb8c4d31 0x4ec58f6 0xb8c52a7 0xf3f17f3 0xf3f164f 0xf6901df")
observer.log.20231114161302:[2023-11-14 16:12:51.412738] WARN  [SHARE.SCHEMA] revert (ob_schema_mgr_cache.cpp:131) [86881][0][YB420CF01A46-0006009B012FF1D0-0-0] [lt=8] [dc=0] long time to hold one guard(schema_mgr=0x7edddba39170, tenant_id=1005, version=1699931135472584, cur_timestamp=1699949571412734, ref_timestamp=1699949415812628, lbt()="0xf51231f 0x6158f04 0x4f5992c 0x50c61cc 0x4ed2f6f 0x4ecf518 0x4ecc8ef 0x4ecaa6e 0xb8c71f1 0x4ec9c90 0xb8c4d31 0x4ec58f6 0xb8c52a7 0xf3f17f3 0xf3f164f 0xf6901df")
observer.log.20231114161302:[2023-11-14 16:12:51.412798] TRACE [TRACE]obmp_base.cpp:147 [86881][0][YB420CF01A46-0006009B012FF1D0-0-0] [lt=5] [dc=0] [slow query](TRACE=begin_ts=1699949415813229 2023-11-14 08:10:15.813229|[start_sql] u=0 addr:{ip:"12.241.29.28", port:16606}|[process_begin] u=0 addr:{ip:"12.241.29.28", port:16606}, in_queue_time:162, receive_ts:1699949415813066, enqueue_ts:1699949415813225, trace_id:YB420CF01A46-0006009B012FF1D0-0-0|[session] u=2 sid:3221784053, tenant_id:1005|[parse_begin] u=6 stmt:"select * from (SELECT COUNT(*) AS TOT_CNT FROM renzy WHERE (ACCT_NO = '6222620117273900882') AND LAWENF_NTIST_TP_CD NOT LIKE '12%' AND LAWENF_NTIST_TP_CD NOT LIKE '05%' AND EMRG_STPY_SRC_CD != 'JZ05' AND ACCTG_DT >= '1900-01-01' AND ACCTG_DT <= '2025-03-31') as orginal limit 2000", stmt_len:287|[exec_begin] u=29 arg1:false, end_trans_cb:false, plan_id:7328133|[do_open_plan_begin] u=8 |[sql_start_stmt_begin] u=1 |[sql_start_participant_begin] u=5 |[storage_table_scan_begin] u=56 |[storage_table_scan_end] u=116 |[get_row] u=155437570 |[result_set_close] u=161554 |[close_plan_begin] u=0 |[revert_scan_iter] u=96 |[end_participant_begin] u=3 |[start_end_stmt] u=1 |[affected_rows] u=0 affected_rows:-1|[store_found_rows] u=1 found_rows:0, return_rows:1|[auto_end_plan_begin] u=0 |[process_end] u=86 run_ts:1699949415813230|total_timeu=155599534)

第21次执行的SQL的关键日志信息:

[2023-11-14 16:12:51.412696] INFO  [SQL.ENG] ob_physical_plan.cpp:736 [86881][0][YB420CF01A46-0006009B012FF1D0-0-0] [lt=15] [dc=0] local query plan is expired due to unstable performance(bret=true, stat_.execute_times_=21, first_query_range_rows=0, sample_exec_row_count=1525906500, sample_count=20)

由该日志,关键信息如下:

1. sample_exec_row_count=1525906500

2. sample_count=20

3. first_query_range_rows=0

结合代码可知该结果满足 Plan 淘汰条件,从而 plan expire。

sample_exec_row_count / sample_count > first_query_range_rows * 10

1525906500 / 20  > 0 * 10  
// 这里 1525906500 的结果,不难得知,是单次SQL扫描行数 * 20. 
// 即(25416176 + 50832349) * 20 = 1524970500 约等于 1525906500 
 MEMSTORE_READ_ROW_COUNT: 25416176
 SSSTORE_READ_ROW_COUNT: 50832349

结论

1.本例主要是想分享SQL走错 Plan 而SQL慢的排查方法论,问题原因还是比较简单,重点是和大家分享处理OB遇到类似问题的思路等。

2.本例问题在当前OB 323版本中没有好的优化方式,给到的建议是:

  • 如果 I5 索引业务上未使用场景,则删除。
  • 绑定 outline,使该SQL走 I2 索引。

3.分享下OB中 Plan Cache 清理策略:

  • 手工清理
-- 租户内执⾏,清除当前租户中所有 Plan Cache。⽣产慎⽤。
ALTER SYSTEM FLUSH PLAN CACHE;
-- sys租户下执⾏,不同粒度。
ALTER SYSTEM FLUSH PLAN CACHE TENANT = 'T_MySQL';
ALTER SYSTEM FLUSH PLAN CACHE sql_id='B601070DFC14CB85FDA3766A69A9E1B3'
databases='myob1' tenant='tenant1' GLOBAL;
  • 自动清理 ob_plan_cache_percentage 参数控制 Plan Cache占用租户内存的百分比。 本例中提到

1. sample_count < SLOW_QUERY_SAMPLE_SIZE) :命中该Plan的SQL执行大于等于20次。 

2.sample_exec_row_count / sample_count > first_query_range_rows * 10 :(执行的SQL扫描总行数 / 执行次数) 大于 (第一次SQL执行扫描的行数 * 10)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/707932.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

【Da-SimaRPN】《Distractor-aware Siamese Networks for Visual Object Tracking》

ECCV-2018 中科大 文章目录 1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method4.1 Features and Drawbacks in Traditional Siamese Networks4.2 Distractor-aware Training4.3 Distractor-aware Incremental Learning4.4 DaSiamRPN for Long-t…

【详解Python文件: .py、.ipynb、.pyi、.pyc、​.pyd !】

今天同事给我扔了一个.pyd文件&#xff0c;说让我跑个数据。然后我就傻了。。 不知道多少粉丝小伙伴会run .pyd代码文件&#xff1f;如果你也懵懵的&#xff0c;请继续往下读吧。。 Python文件是存储Python代码或数据的文本文件&#xff0c;通常以.py作为文件扩展名。这些文件…

k8s 自动伸缩机制-------HPA 超详细解读

目录 在K8s中扩缩容分为两种&#xff1a; 前言 弹性伸缩是根据用户的业务需求和策略&#xff0c;自动“调整”其“弹性资源”的管理服务。通过弹性伸缩功能&#xff0c;用户可设置对定时、周期或监控策略&#xff0c;恰到好处地增加或减少“弹性资源”&#xff0c;并完成实例…

单片机课设-基于单片机的电子时钟设计(仿真+代码+报告)

基于单片机的电子时钟设计 前言一、课设任务是什么?二、系统总体方案硬件设计2.1 系统硬件总体设计2.2 键盘电路设计2.3 DS1302实时时钟芯片电路设计2.4 复位电路2.5 LCD电路设计 三、软件设计3.1 主程序流程图3.2 主要程序设计代码3.3 修改时间函数3.4 扫描键盘函数 四、仿真…

基坑监测:关键环节与深入剖析,保障施工安全与质量新标准

在建筑工程中&#xff0c;基坑监测是一项至关重要的工作&#xff0c;它涉及对基坑施工现场的实时监测数据进行分析和评估&#xff0c;以确保基坑施工活动的稳定、安全和高效进行。基坑监测涵盖地质勘探、基坑开挖、加固、支护、周边环境以及工程质量验收等多个环节&#xff0c;…

FlashDB的TS数据库的标准ANSI C移植验证

本文目录 1、引言2、环境准备3、修改驱动4、验证 文章对应视频教程&#xff1a; 暂无&#xff0c;可以关注我的B站账号等待更新。 点击图片或链接访问我的B站主页~~~ 1、引言 在当今数据驱动的时代&#xff0c;高效可靠的数据存储与管理对于嵌入式系统及物联网(IoT)应用至关重…

【C++继承解密】:构建层次化设计的艺术

&#x1f4c3;博客主页&#xff1a; 小镇敲码人 &#x1f49a;代码仓库&#xff0c;欢迎访问 &#x1f680; 欢迎关注&#xff1a;&#x1f44d;点赞 &#x1f442;&#x1f3fd;留言 &#x1f60d;收藏 &#x1f30f; 任尔江湖满血骨&#xff0c;我自踏雪寻梅香。 万千浮云遮碧…

Qt信号槽的回调机制

问&#xff1a;Qt强大的地方在哪里&#xff1f; 答&#xff1a;跨平台、信号槽。。。 问&#xff1a;信号槽是什么&#xff1f; 答&#xff1a;回调函数 问&#xff1a;怎么个回调法子 答&#xff1a;。。。 成果 信号槽本身实现过程是有些复杂的&#xff0c;所以本人参考…

代理模式与静态代理、动态代理的实现(Proxy.newProxyInstance、InvocationHandler)

代理模式 代理模式是23种设计模式中比较常用的一种&#xff0c;属于结构型设计模式。在 Android 领域中&#xff0c;有大量的库都使用了代理模式&#xff0c;例如 Retrofit 使用动态代理来实现 API 接口的调用&#xff0c;Dagger 使用代码生成和反射机制来创建依赖注入的代理对…

Linux——ansible剧本

剧本&#xff08;playbook&#xff09; 现在&#xff0c;可以写各种临时命令 但如果&#xff0c;想把所有步骤&#xff0c;集合到一起&#xff0c;写到同一个文件里 让ansible自动按顺序执行 就必须要写“剧本” 剧本里面&#xff0c;也可以写临时命令&#xff0c;但是剧本…

大数据集成平台建设方案(Word方案)

基础支撑平台主要承担系统总体架构与各个应用子系统的交互&#xff0c;第三方系统与总体架构的交互。需要满足内部业务在该平台的基础上&#xff0c;实现平台对于子系统的可扩展性。基于以上分析对基础支撑平台&#xff0c;提出了以下要求&#xff1a; 基于平台的基础架构&…

自动驾驶---Perception之视觉点云雷达点云

1 前言 在自动驾驶领域&#xff0c;点云技术的发展历程可以追溯到自动驾驶技术的早期阶段&#xff0c;特别是在环境感知和地图构建方面。 在自动驾驶技术的早期技术研究中&#xff0c;视觉点云和和雷达点云都有出现。20世纪60年代&#xff0c;美国MIT的Roberts从2D图像中提取3D…

荣耀手机删除系统APP

1、打开开发者模式 设置–系统–关于手机–快速多次点击手机的版本号&#xff0c;即可进入开发者模式。 然后进入开发人员选项&#xff0c;开启USB调试&#xff0c;如下图。 2、数据线连接电脑&#xff0c;检查设备连接情况 按键盘winR键&#xff0c;在弹窗中输入cmd&#…

idea有这个类却报红,无法用快捷键找到

idea有这个类却报红&#xff0c;无法用快捷键找到&#xff0c;但是项目启动却没有任何问题&#xff0c;严重影响到了开发效率&#xff0c;关idea 重新打开没有用。 找了一圈&#xff0c;办法如下&#xff1a; 1、点击左上角的 File—>Invalidate Caches/Restar 2、点击 In…

PostgreSQL和Oracle的数据类型对比:时间类型 #PG培训

在数据库管理系统中&#xff0c;时间数据类型是非常关键的一部分。时间数据类型的选择和使用直接影响到数据存储、查询效率和应用程序的设计。本文将对比PostgreSQL和Oracle在时间类型方面的实现和特性。 #PG考试#postgresql培训#postgresql考试#postgresql认证 日期和时间类型…

MFC基础学习应用

MFC基础学习应用 1.基于对话框的使用 左上角为菜单键&#xff08;其下的关于MFC主要功能由IDD_ABOUTBOX决定) 附图 右下角为按钮&#xff08;基本功能由IDD_DIALOG决定,添加按钮使用由左上角的工具箱完成) 附图 2.自行添加功能与按钮//功能代码 void CMFCApplication4Dlg:…

渗透测试练习题解析 6 (CTF web)

1、[HCTF 2018]admin 1 考点&#xff1a;二次注入 先注册一个账号看看&#xff0c;注册 admin 会提示该用户名已存在&#xff0c;那我们就换一个用户名 观察页面功能&#xff0c;存在一个修改密码&#xff0c;开始的思路是想看看能否通过该密码功能抓包然后修改用户名为 admin …

react用ECharts实现组织架构图

找到ECharts中路径图。 然后开始爆改。 <div id{org- name} style{{ width: 100%, height: 650, display: flex, justifyContent: center }}></div> // data的数据格式 interface ChartData {name: string;value: number;children: ChartData[]; } const treeDep…

数据结构与算法笔记:基础篇 -图的表示:如何存储微博、微信等社交网络中的好友关系?

概述 微博、微信这些社交软件你肯定玩过吧。在微博中&#xff0c;两个人可以互相关注&#xff1b;在微信中&#xff0c;两个人可以互加好友。那你知道&#xff0c;如何存储微博、微信等这些社交网络的好友关系吗&#xff1f; 这就用到本章讲解的这种数据结构&#xff1a;图。…

watch什么场景下会被影响?

❌1、当组件通过import这种方式&#xff0c;子组件watch由于加载慢&#xff0c;不会被执行 ❌定位上发问题 1、当前页面刷新&#xff0c;以为是watch绑定的值没有改变&#xff0c;通过workflowId null 改变&#xff0c;子组件还是不会触发watch&#xff0c; 2、 当前页面刷新…