本文来自 2024年OceanBase技术征文大赛——“让技术被看见 | OceanBase 布道师计划”的用户征文。也欢迎更多的技术爱好者参与征文,赢取万元大奖。和我们一起,用文字让代码跳动起来! 参与2024年OceanBase技术征文大赛>>
MySQL在5.7.8版本中引入了JSON数据类型及相应的JSON函数,而在8.0.17版本中,又推出了针对值数组的多值索引。这些功能非常实用,一度成为 MySQL迁移到OceanBase 上的一个阻点。之前,OceanBase 在 3.2.2版本推出了JSON数据类型及JSON函数,4.3.2版本则实现了多值索引。本文将分享OceanBase 4.3.2版本中JSON数据类型及其索引的使用示例和实践。
OB 的 ORACLE 租户和 MySQL 租户都支持 JSON 类型,本文主要演示 MySQL 租户下的 JSON 使用。
首先创建一个含有 JSON 类型的表,示例参考 MySQL 官方文档中 JSON 示例表。
CREATE TABLE customers (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
custinfo JSON
);
INSERT INTO customers VALUES
(NULL, NOW(), '{"user":"Jack","user_id":37,"zipcode":[94582,94536]}'),
(NULL, NOW(), '{"user":"Jill","user_id":22,"zipcode":[94568,94507,94582]}'),
(NULL, NOW(), '{"user":"Bob","user_id":31,"zipcode":[94477,94507]}'),
(NULL, NOW(), '{"user":"Mary","user_id":72,"zipcode":[94536]}'),
(NULL, NOW(), '{"user":"Ted","user_id":56,"zipcode":[94507,94582]}')
;
select * from customers;
JSON 列不支持直接索引,如果想对 JSON 列部分字段做索引,可以新增一个虚拟列,然后针对这个虚拟列创建索引。
ALTER TABLE customers ADD v_user varchar(20) GENERATED ALWAYS AS (json_unquote(json_extract (`custinfo`, _utf8mb4'$.user'))) virtual ;
ALTER TABLE customers ADD KEY idx_user(v_user);
再看看查询虚拟列的执行计划。
mysql> explain SELECT * FROM customers WHERE v_user='Bob';
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| =============================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| --------------------------------------------------------------- |
| |0 |TABLE RANGE SCAN|customers(idx_user)|1 |7 | |
| =============================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([customers.id], [customers.modified], [customers.custinfo], [column_conv(VARCHAR,utf8mb4_general_ci,length:20,NULL,cast(json_unquote(json_extract(customers.custinfo, |
| '$.user')), VARCHAR(1048576)))]), filter(nil) |
| access([customers.id], [customers.custinfo], [customers.v_user], [customers.modified]), partitions(p0) |
| is_index_back=true, is_global_index=false, |
| range_key([customers.v_user], [customers.id]), range(Bob,MIN ; Bob,MAX), |
| range_cond([customers.v_user = 'Bob']) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set, 1 warning (0.02 sec)
上面方法如果要对 JSON 字段 zipcode
做索引就有点困难,这个列的值是个数组。这就用到新推出的多值索引功能。
先看查询场景 SQL 。
EXPLAIN SELECT * FROM customers WHERE 94507 MEMBER OF(custinfo->'$.zipcode');
EXPLAIN SELECT * FROM customers WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
EXPLAIN SELECT * FROM customers WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
这三个 SQL 使用了常见的 JSON 函数,执行计划全部是全表扫描就不发了。应对方法就是新增多值索引。
mysql> ALTER TABLE customers ADD INDEX zips( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
ERROR 1235 (0A000): dynamic add multivalue index not supported yet not supported
mysql>
遗憾的是由于 OB 4.3.2 是刚支持多值索引,目前还只实现在建表的时候创建多值索引,暂不支持后期动态添加多值索引。
所以我们再创建一个带多值索引的新表看看。
create table customers2(
id bigint not null auto_increment ,
modified datetime default current_timestamp on update current_timestamp,
custinfo json,
index zips((cast(custinfo->'$.zipcode' as unsigned array))),
INDEX comp(id, modified,(cast(custinfo->'$.zipcode' as unsigned array)))
);
INSERT INTO customers2 SELECT * FROM customers;
为了减少篇幅,我这里一次性创建两类多值索引。一个是针对单列的多值索引,一个是多列组合索引。
mysql> EXPLAIN SELECT * FROM customers2 WHERE 94507 MEMBER OF(custinfo->'$.zipcode');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| =========================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------------------- |
| |0 |TABLE FULL SCAN|customers2(zips)|2 |13 | |
| =========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([customers2.id], [customers2.modified], [customers2.custinfo]), filter([JSON_MEMBER_OF(94507, JSON_EXTRACT(customers2.custinfo, '$.zipcode'))]) |
| access([customers2.__pk_increment], [customers2.custinfo], [customers2.id], [customers2.modified]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([customers2.SYS_NC_mvi_19], [customers2.__pk_increment], [customers2.__doc_id_1727685398954214]), range(94507,MIN,MIN ; 94507,MAX,MAX) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM customers2 WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| =========================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------------------- |
| |0 |TABLE FULL SCAN|customers2(zips)|3 |23 | |
| =========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([customers2.id], [customers2.modified], [customers2.custinfo]), filter([JSON_CONTAINS(JSON_EXTRACT(customers2.custinfo, '$.zipcode'), cast('[94507, |
| 94582]', JSON(536870911)))]) |
| access([customers2.__pk_increment], [customers2.custinfo], [customers2.id], [customers2.modified]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([customers2.SYS_NC_mvi_19], [customers2.__pk_increment], [customers2.__doc_id_1727685398954214]), range(94507,MIN,MIN ; 94507,MAX,MAX), |
| (94582,MIN,MIN ; 94582,MAX,MAX) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM customers2 WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| =========================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------------------- |
| |0 |TABLE FULL SCAN|customers2(zips)|3 |23 | |
| =========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([customers2.id], [customers2.modified], [customers2.custinfo]), filter([JSON_OVERLAPS(JSON_EXTRACT(customers2.custinfo, '$.zipcode'), cast('[94507, |
| 94582]', JSON(536870911)))]) |
| access([customers2.__pk_increment], [customers2.custinfo], [customers2.id], [customers2.modified]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([customers2.SYS_NC_mvi_19], [customers2.__pk_increment], [customers2.__doc_id_1727685398954214]), range(94507,MIN,MIN ; 94507,MAX,MAX), |
| (94582,MIN,MIN ; 94582,MAX,MAX) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM customers2 WHERE id = 23 and modified = 103 and 94507 MEMBER OF(custinfo->'$.zipcode');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================ |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------------------ |
| |0 |TABLE RANGE SCAN|customers2(comp)|1 |7 | |
| ============================================================ |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([customers2.id], [customers2.modified], [customers2.custinfo]), filter([JSON_MEMBER_OF(94507, JSON_EXTRACT(customers2.custinfo, '$.zipcode'))]) |
| access([customers2.__pk_increment], [customers2.custinfo], [customers2.id], [customers2.modified]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([customers2.id], [customers2.modified], [customers2.SYS_NC_mvi_21], [customers2.__pk_increment], [customers2.__doc_id_1727685398954214]), |
| range(23,2000-01-03 00:00:00.000000,MIN,MIN,MIN ; 23,2000-01-03 00:00:00.000000,MAX,MAX,MAX), |
| range_cond([customers2.id = 23], [customers2.modified = INTERNAL_FUNCTION(103, 110, 17)]) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)
这里直接将 4 种使用多值索引的查询场景 SQL 以及执行计划放出,这执行计划中的细节大家可以一一详细查看。
多值索引的结构如下。
mysql> show indexes from customers2;
+------------+------------+----------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
| customers2 | 1 | zips | 1 | SYS_NC_mvi_19 | A | NULL | NULL | NULL | YES | BTREE | available | | YES | json_query(`custinfo`,'$.zipcode' RETURNING unsigned WITHOUT ARRAY WRAPPER asis error on error null on empty null on mismatch) |
| customers2 | 1 | zips | 2 | __pk_increment | A | NULL | NULL | NULL | | BTREE | available | | YES | NULL |
| customers2 | 1 | zips | 3 | __doc_id_1727685398954214 | A | NULL | NULL | NULL | | BTREE | available | | YES | NULL |
| customers2 | 1 | comp | 1 | id | A | NULL | NULL | NULL | | BTREE | available | | YES | NULL |
| customers2 | 1 | comp | 2 | modified | A | NULL | NULL | NULL | YES | BTREE | available | | YES | NULL |
| customers2 | 1 | comp | 3 | SYS_NC_mvi_21 | A | NULL | NULL | NULL | YES | BTREE | available | | YES | json_query(`custinfo`,'$.zipcode' RETURNING unsigned WITHOUT ARRAY WRAPPER asis error on error null on empty null on mismatch) |
| customers2 | 1 | comp | 4 | __pk_increment | A | NULL | NULL | NULL | | BTREE | available | | YES | NULL |
| customers2 | 1 | comp | 5 | __doc_id_1727685398954214 | A | NULL | NULL | NULL | | BTREE | available | | YES | NULL |
+------------+------------+----------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)
还有一类特殊的场景是多值索引的唯一性索引。
CREATE TABLE customers3 (
id BIGINT not null primary key,
modified BIGINT not null,
custinfo JSON,
UNIQUE INDEX zips1( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) )
);
INSERT INTO customers3 VALUES
(10, 21, '{"user":"Jack","user_id":37,"zipcode":[94582,94536]}');
mysql> select * from customers3;
+----+----------+------------------------------------------------------------+
| id | modified | custinfo |
+----+----------+------------------------------------------------------------+
| 10 | 21 | {"user": "Jack", "user_id": 37, "zipcode": [94582, 94536]} |
+----+----------+------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> INSERT INTO customers3 VALUES (11, 22, '{"user":"Jill","user_id":22,"zipcode":[94568,94507,94582]}');
ERROR 1062 (23000): Duplicate entry '94582' for key 'zips1'
mysql>
mysql> show indexes from customers3;
+------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
| customers3 | 0 | PRIMARY | 1 | id | A | NULL | NULL | NULL | | BTREE | available | | YES | NULL |
| customers3 | 0 | zips1 | 1 | SYS_NC_mvi_19 | A | NULL | NULL | NULL | YES | BTREE | available | | YES | json_query(`custinfo`,'$.zipcode' RETURNING unsigned WITHOUT ARRAY WRAPPER asis error on error null on empty null on mismatch) |
+------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
由此可见唯一性的多值索引能拦截导致多值数组中出现重复的值。
多值索引也有一些功能限制。
- 多值索引不能定义列的顺序 ASC 或 DESC,也不能用于消除排序,多值列不能用于主键。
EXPLAIN SELECT v_user FROM customers order by v_user;
EXPLAIN SELECT custinfo FROM customers2 order by custinfo;