1.1 创建普通B-tree索引的整体流程
如下是梳理的创建普通B-tree索引的大概流程,可供参考。
1.校验新索引的Catalog元数据
|语法解析 ---将创建索引的sql解析成IndexStmt结构
|校验B-Tree的handler -----校验内核是否支持该类型的索引,在pg_am中查找"btree"对应的handler
|校验索引列及比较函数 ----查找pg_attribute,校验create index中指定的索引列是否存在,如果存在记录attno,并且根据atttypid(表示表中列的字段类型)在pg_opclass里查找对应的比较函数
2.在文件系统中创建索引文件
|生成oid ---为新的索引文件生成唯一oid,过程是:生成一个新的oid,然后查找pg_class的索引,如果不存在就返回这个oid
|更新本地的relcache ---正在创建的relation添加到relcache
|创建索引文件以及写xlog ----文件系统中生成新的文件 base/xxx/xxx。xlog记录类型为XLogInsert(RM_SMGR_ID, XLOG_SMGR_CREATE | XLR_SPECIAL_REL_UPDATE)
3.创建新索引的元数据
|索引作为对象的元数据写入到pg_class
|索引文件引用的列插入pg_attribute
|索引本身相关信息插入pg_index
|relcache失效,invalid所有heap相关的元数据 ---为了使catalog元数据的变更对所有进程生效
|记录该索引对heap的依赖,对opclass的依赖等等插入pg_depend
|使得新索引文件相关的relcache生效
4.用函数btbuild构建B-tree索引
|通过index的索引列构建排序时需要用到的sortkey,扫描tuple生成索引元组数组
| 构建B+树叶节点
|对索引元组执行排序
|将排序成功的结点依次插入到B-Tree中,自下向上构建B-Tree索引page --依次读取排好序的tuple,填充到B-Tree的叶子节点上,自下向上插入B-Tree
1.2 锁相关介绍
PostgreSQL里有很多可以加锁的对象:表、单个页、单个元组、事务ID(包括虚拟和永久ID)和普通数据库对象等等,常规锁的locktype主要有以下几种。有时候通过pg_locks查询的时候,根据pid会查到许多的锁,但是这些锁并不一定都是加在表上的,根据locktype以及relation过滤出不同对象上的锁。
/*
* LOCKTAG is the key information needed to look up a LOCK item in the
* lock hashtable. A LOCKTAG value uniquely identifies a lockable object.
*
* The LockTagType enum defines the different kinds of objects we can lock.
* We can handle up to 256 different LockTagTypes.
*/
typedef enum LockTagType
{
LOCKTAG_RELATION, /* whole relation */
LOCKTAG_RELATION_EXTEND, /* the right to extend a relation */
LOCKTAG_DATABASE_FROZEN_IDS, /* pg_database.datfrozenxid */
LOCKTAG_PAGE, /* one page of a relation */
LOCKTAG_TUPLE, /* one physical tuple */
LOCKTAG_TRANSACTION, /* transaction (for waiting for xact done) */
LOCKTAG_VIRTUALTRANSACTION, /* virtual transaction (ditto) */
LOCKTAG_SPECULATIVE_TOKEN, /* speculative insertion Xid and token */
LOCKTAG_OBJECT, /* non-relation database object */
LOCKTAG_USERLOCK, /* reserved for old contrib/userlock code */
LOCKTAG_ADVISORY /* advisory user locks */
} LockTagType;
如下是PostgreSQL里的常规锁,其中AccessShareLock、RowShareLock、RowExclusiveLock属于弱锁,ShareLock、ShareRowExclusiveLock、ExclusiveLock 、AccessExclusiveLock属于强锁。
/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */
#define NoLock 0
#define AccessShareLock 1 /* SELECT */
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4 /* VACUUM (non-FULL),ANALYZE, CREATE * INDEX CONCURRENTLY */
#define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE MODE, but allows ROW * SHARE */
#define ExclusiveLock 7 /* blocks ROW SHARE/SELECT...FOR * UPDATE */
#define AccessExclusiveLock 8 /* ALTER TABLE, DROP TABLE, VACUUM * FULL, and unqualified LOCK TABLE */
1.3 创建普通索引表上需要获取的锁(ShareLock)
使用如下的语句进行测试
postgres=# create table tab_test_1(id int);
CREATE TABLE
postgres=# insert into tab_test_1 values(1);
INSERT 0 1
postgres=# insert into tab_test_1 values(2);
INSERT 0 1
postgres=# insert into tab_test_1 values(3);
INSERT 0 1
postgres=# begin;
BEGIN
postgres=*# create index idx_1 on tab_test_1(id);
CREATE INDEX
postgres=*#
普通创建索引获取ShareLock,是5级锁,这里通过SQL可以查询到,其实在创建索引的过程中,不仅原始的表上会申请锁。对应的几张系统表和系统表索引上也会申请锁,因为创建索引也涉及到系统表元数据的校验和更改,这里访问pg_namespace的意义在于:索引将与其父表位于同一命名空间中。
当有多个事务同时对表进行读操作时,如果不做任何锁定,会导致数据不一致。例如,一个事务在读取数据时,另一个事务做了修改,然后第一个事务读到的数据已经不是最新的。这时候就需要使用AccessShareLock来控制并发的读取操作,保持数据的一致性。
postgres=# select l.locktype,ns.nspname,a.relname,a.relkind,l.pid,l.mode,l.granted,p.query_start,p.query,p.state from pg_locks l,pg_stat_activity p,pg_class a,pg_namespace ns where l.locktype='relation' and l.pid=p.pid and query not like '%pg_stat_activity%' and l.relation=a.oid and a.relnamespace=ns.oid;
locktype | nspname | relname | relkind | pid | mode | granted | query_start |
query | state
----------+------------+-----------------------------------+---------+--------+-----------------+---------+-------------------------------+-----------
----------------------------+---------------------
relation | public | tab_test_1 | r | 409706 | ShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); | idle in transaction
relation | pg_catalog | pg_class | r | 409706 | AccessShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); | idle in transaction
relation | pg_catalog | pg_namespace | r | 409706 | AccessShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); | idle in transaction
relation | pg_catalog | pg_namespace_nspname_index | i | 409706 | AccessShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); | idle in transaction
relation | pg_catalog | pg_namespace_oid_index | i | 409706 | AccessShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); | idle in transaction
relation | pg_catalog | pg_class_oid_index | i | 409706 | AccessShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); | idle in transaction
relation | pg_catalog | pg_class_relname_nsp_index | i | 409706 | AccessShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); | idle in transaction
relation | pg_catalog | pg_class_tblspc_relfilenode_index | i | 409706 | AccessShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); | idle in transaction
(8 rows)
1.4 关于ShareLock的其他场景
在源码的注释部分,可以看到CREATE INDEX (WITHOUT CONCURRENTLY) 的时候会在表上申请ShareLock,这和我们的测试结果相符合,但是ShareLock不仅仅创建索引的时候会获取,当多个事务更新同一行的时候,也会申请ShareLock,不过这个ShareLock不是在表级别申请的,而是在分配事务ID时对这个事务ID进行加锁, 用于元组并发更新时做事务等待。
分别用两个session做如下操作:
//session 1,pid:434865
postgres=# begin;
BEGIN
postgres=*# update tab_test_1 set id=7 where id=1;
UPDATE 1
postgres=*#
//session 2,pid:433290
postgres=# begin;
BEGIN
postgres=*# update tab_test_1 set id=7 where id=1;
然后用另一个session查询锁的状态
postgres=# select * from pg_locks where pid='434865';
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode
| granted | fastpath | waitstart
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+--------------
----+---------+----------+-----------
relation | 13008 | 16725 | | | | | | | | 3/16 | 434865 | RowExclusiveL
ock | t | t |
virtualxid | | | | | 3/16 | | | | | 3/16 | 434865 | ExclusiveLock
| t | t |
transactionid | | | | | | 1699 | | | | 3/16 | 434865 | ExclusiveLock
| t | f |
(3 rows)
postgres=# select * from pg_locks where pid='433290';
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode
| granted | fastpath | waitstart
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+--------------
----+---------+----------+-------------------------------
relation | 13008 | 16725 | | | | | | | | 4/3 | 433290 | RowExclusiveLock | t | t |
virtualxid | | | | | 4/3 | | | | | 4/3 | 433290 | ExclusiveLock
| t | t |
transactionid | | | | | | 1700 | | | | 4/3 | 433290 | ExclusiveLock
| t | f |
tuple | 13008 | 16725 | 0 | 1 | | | | | | 4/3 | 433290 | ExclusiveLock
| t | f |
transactionid | | | | | | 1699 | | | | 4/3 | 433290 | ShareLock
| f | f | 2024-01-04 13:33:16.667889+08
(5 rows)
可以通过上述测试看到,pid为433290的这个session,它的pg_locks的最后一行 的granted为’f’,说明该进程被阻塞,并且是在申请类型为tansactionid的锁时被阻塞了,对应tansactionid=1699的事务。从表上可以看出这个tansactionid已经被进程pid为434865的session1会话持有了。
行锁的阻塞信息是通过tansactionid类型的锁体现的,行锁是会在数据行上加自己的tansactionid的,另一个进程读到这一行时,如果发现上一个操作该行的事务未结束,会把上一个事务的tansactionid读出来,然后申请在这个tansactionid上加上ShareLock,等待上一个事务结束,再获得ExclusiveLock。而持有行锁的进程已经在此tansactionid上加了ExclusiveLock,所以后面要更新这行的进程会被阻塞。
ShareLock即读锁,ExclusiveLock即写锁。对事务ID加ShareLock是为了事务不提交,其他人看不到修改后的行,而ExclusiveLock是防止并发操作的。
1.5 创建普通索引时阻塞的一些操作
如下是对创建普通索引申请的ShareLock后的一些阻塞情况
1.5.1 建普通索引阻塞DML操作
//开一个session,开启事务创建索引,然后不提交
postgres=# begin;
BEGIN
postgres=*# create index idx_1 on tab_test_1(id);
CREATE INDEX
postgres=*#
//新开一个session,查询这张表,可以正常访问,不阻塞读
postgres=# select * from tab_test_1 ;
id
----
1
2
3
(3 rows)
//新开多个session,分别做dml操作
//session A:
postgres=# insert into tab_test_1 values(6);
//session B:
postgres=# update tab_test_1 set id=7 where id=1;
//session C:
postgres=# delete from tab_test_1 where id=1;
//另开一个session,查看获取的锁
postgres=# select l.locktype,ns.nspname,a.relname,a.relkind,l.pid,l.mode,l.granted,p.query_start,p.query,p.state from pg_locks l,pg_stat_activity p,pg_class a,pg_namespace ns where l.locktype='relation' and l.pid=p.pid and query not like '%pg_stat_activity%' and l.relation=a.oid and a.relnamespace=ns.oid and a.relname='tab_test_1';
locktype | nspname | relname | relkind | pid | mode | granted | query_start | query |
state
----------+---------+------------+---------+--------+------------------+---------+-------------------------------+----------------------------------------+
---------------------
relation | public | tab_test_1 | r | 409706 | ShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id); |
idle in transaction
relation | public | tab_test_1 | r | 410009 | RowExclusiveLock | f | 2024-01-04 12:48:34.640655+08 | insert into tab_test_1 values(6); |
active
relation | public | tab_test_1 | r | 420570 | RowExclusiveLock | f | 2024-01-04 12:48:37.450632+08 | update tab_test_1 set id=7 where id=1; |
active
relation | public | tab_test_1 | r | 420581 | RowExclusiveLock | f | 2024-01-04 12:48:39.828598+08 | delete from tab_test_1 where id=1; |
active
(4 rows)
//以及查看阻塞源,可以看到表上的dml操作都是被create index 给阻塞了。
pid | blocked_by | state | wait | wait_age | tx_age | xid_age | xmin_ttf | datname | usename | blkd | que
ry
--------+------------+---------+-------------------+----------+----------+---------+---------------+----------+----------+------+--------------------------
--------------------------
409706 | {} | idletx | Client:ClientRead | | 01:02:44 | 5 | | postgres | postgres | 3 | [409706] create index idx_1 on tab_test_1(id);
410009 | {409706} | waiting | Lock:relation | 00:01:57 | 00:01:57 | | 2,147,483,642 | postgres | postgres | 0 | [410009] . insert into tab_test_1 values(6);
420570 | {409706} | waiting | Lock:relation | 00:01:54 | 00:01:54 | | 2,147,483,642 | postgres | postgres | 0 | [420570] . update tab_test_1 set id=7 where id=1;
420581 | {409706} | waiting | Lock:relation | 00:01:51 | 00:01:51 | | 2,147,483,642 | postgres | postgres | 0 | [420581] . delete from tab_test_1 where id=1;
(4 rows)
1.5.2 建普通索引阻塞DDL操作
//开一个session,开启事务创建索引,然后不提交
postgres=# begin;
BEGIN
postgres=*# create index idx_1 on tab_test_1(id);
CREATE INDEX
postgres=*#
//新开一个session,执行DDL操作,alter table加列
postgres=# alter table tab_test_1 add column name varchar(20);
//另开一个session,查看获取的锁
postgres=# select l.locktype,ns.nspname,a.relname,a.relkind,l.pid,l.mode,l.granted,p.query_start,p.query,p.state from pg_locks l,pg_stat_activity p,pg_class a,pg_namespace ns where l.locktype='relation' and l.pid=p.pid and query not like '%pg_stat_activity%' and l.relation=a.oid and a.relnamespace=ns.oid and a.relname='tab_test_1';
locktype | nspname | relname | relkind | pid | mode | granted | query_start | query
| state
----------+---------+------------+---------+--------+---------------------+---------+-------------------------------+--------------------------------------
---------------+---------------------
relation | public | tab_test_1 | r | 409706 | ShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id);
| idle in transaction
relation | public | tab_test_1 | r | 422658 | AccessExclusiveLock | f | 2024-01-04 12:51:52.948432+08 | alter table tab_test_1 add column nam
e varchar(20); | active
(2 rows)
//查看阻塞源,可以看到表上alter table加列的DDL操作被create index给阻塞了。
pid | blocked_by | state | wait | wait_age | tx_age | xid_age | xmin_ttf | datname | usename | blkd |
query
--------+------------+---------+-------------------+----------+----------+---------+---------------+----------+----------+------+--------------------------
---------------------------------------
409706 | {} | idletx | Client:ClientRead | | 01:04:09 | 6 | | postgres | postgres | 1 | [409706] create index idx_1 on tab_test_1(id);
422658 | {409706} | waiting | Lock:relation | 00:00:03 | 00:00:03 | 1 | 2,147,483,641 | postgres | postgres | 0 | [422658] . alter table tab_test_1 add column name varchar(20);
(2 rows)
//新开一个session,执行DDL操作,drop table删除表
postgres=# drop table tab_test_1;
//另开一个session,查看获取的锁
postgres=# select l.locktype,ns.nspname,a.relname,a.relkind,l.pid,l.mode,l.granted,p.query_start,p.query,p.state from pg_locks l,pg_stat_activity p,pg_class a,pg_namespace ns where l.locktype='relation' and l.pid=p.pid and query not like '%pg_stat_activity%' and l.relation=a.oid and a.relnamespace=ns.oid and a.relname='tab_test_1';
locktype | nspname | relname | relkind | pid | mode | granted | query_start | query
| state
----------+---------+------------+---------+--------+---------------------+---------+-------------------------------+--------------------------------------
-+---------------------
relation | public | tab_test_1 | r | 409706 | ShareLock | t | 2024-01-04 11:48:11.807816+08 | create index idx_1 on tab_test_1(id);
| idle in transaction
relation | public | tab_test_1 | r | 422669 | AccessExclusiveLock | f | 2024-01-04 12:53:17.664053+08 | drop table tab_test_1;
| active
(2 rows)
//查看阻塞源,可以看到表上drop table删除表的DDL操作被create index给阻塞了。
pid | blocked_by | state | wait | wait_age | tx_age | xid_age | xmin_ttf | datname | usename | blkd | quer
y
--------+------------+---------+-------------------+----------+----------+---------+---------------+----------+----------+------+--------------------------
-----------------------
409706 | {} | idletx | Client:ClientRead | | 01:05:43 | 7 | | postgres | postgres | 1 | [409706] create index idx_1 on tab_test_1(id);
422669 | {409706} | waiting | Lock:relation | 00:00:12 | 00:00:12 | 1 | 2,147,483,640 | postgres | postgres | 0 | [422669] . drop table tab_test_1;
(2 rows)
1.5.3 阻塞vacuum,vacuum full,analyze
//开一个session,开启事务创建索引,然后不提交
postgres=# begin;
BEGIN
postgres=*# create index idx_1 on tab_test_1(id);
CREATE INDEX
postgres=*#
//新开一个session,vacuum该表
postgres=# vacuum tab_test_1;
//另开一个session,查看获取的锁
postgres=# select l.locktype,ns.nspname,a.relname,a.relkind,l.pid,l.mode,l.granted,p.query_start,p.query,p.state from pg_locks l,pg_stat_activity p,pg_class a,pg_namespace ns where l.locktype='relation' and l.pid=p.pid and query not like '%pg_stat_activity%' and l.relation=a.oid and a.relnamespace=ns.oid;
locktype | nspname | relname | relkind | pid | mode | granted | query_start | query
| state
----------+---------+------------+---------+--------+--------------------------+---------+-------------------------------+---------------------------------
------+---------------------
relation | public | tab_test_1 | r | 452619 | ShareLock | t | 2024-01-04 14:50:00.524497+08 | create index idx_1 on tab_test_1
(id); | idle in transaction
relation | public | tab_test_1 | r | 452585 | ShareUpdateExclusiveLock | f | 2024-01-04 14:50:35.244644+08 | vacuum tab_test_1 ;
| active
(2 rows)
//查看阻塞源,可以看到表vacuum操作被create index给阻塞了。
pid | blocked_by | state | wait | wait_age | tx_age | xid_age | xmin_ttf | datname | usename | blkd | quer
y
--------+------------+---------+-------------------+----------+----------+---------+---------------+----------+----------+------+--------------------------
-----------------------
452619 | {} | idletx | Client:ClientRead | | 00:00:40 | 1 | | postgres | postgres | 1 | [452619] create index id
x_1 on tab_test_1(id);
452585 | {452619} | waiting | Lock:relation | 00:00:02 | 00:00:02 | | 2,147,483,646 | postgres | postgres | 0 | [452585] . vacuum tab_te
st_1 ;
(2 rows)
可以看到vacuum改表被create index阻塞了,vacuum full,analyze测试方法类似,也是一样被阻塞了。
//阻塞vacuum full
pid | blocked_by | state | wait | wait_age | tx_age | xid_age | xmin_ttf | datname | usename | blkd | quer
y
--------+------------+---------+-------------------+----------+----------+---------+---------------+----------+----------+------+--------------------------
-----------------------
452619 | {} | idletx | Client:ClientRead | | 00:12:38 | 2 | | postgres | postgres | 1 | [452619] create index id
x_1 on tab_test_1(id);
452585 | {452619} | waiting | Lock:relation | 00:00:07 | 00:00:07 | 1 | 2,147,483,645 | postgres | postgres | 0 | [452585] . vacuum full t
ab_test_1 ;
(2 rows)
//阻塞analyze
pid | blocked_by | state | wait | wait_age | tx_age | xid_age | xmin_ttf | datname | usename | blkd | quer
y
--------+------------+---------+-------------------+----------+----------+---------+---------------+----------+----------+------+--------------------------
-----------------------
452619 | {} | idletx | Client:ClientRead | | 00:14:04 | 2 | | postgres | postgres | 1 | [452619] create index id
x_1 on tab_test_1(id);
452585 | {452619} | waiting | Lock:relation | 00:00:07 | 00:00:07 | | 2,147,483,645 | postgres | postgres | 0 | [452585] . analyze tab_t
est_1;
(2 rows)
1.6 创建普通B-tree索引的可能遇到的问题
问题:
创建普通B-tree索引在表上申请的是ShareLock,ShareLock和RowExclusiveLock是冲突的,所以create index会等待表上所有的DML(增删改)结束。但是实际在生产环境下,如果业务不停运行,涉及到要加索引的表不停的有DML操作,那么,执行了create index操作后,可能会发现长时间都没有反应,因为create index操作可能长时间获取不到锁,然后一直处于锁的等待队列里。
就算能获取到了锁,可以创建索引了,但是在创建索引期间,也会阻塞所有的DML(增删改),如果需要加索引的表是一个大表,并且需要加索引的这列数据也比较复杂,那么可能执行时间比较长,那么对DML的阻塞时间也会比较长,这对于某些业务场景可能是不能接受的。除了阻塞时间长一个问题外,如果建索引期间,业务较多,被阻塞的DML大量累积,有可能导致pg_locks里累积的越来越多,最后导致OOM。
解决方法:
(1)对于建索引的操作尽量选择业务量较少的时候执行,或者有条件在停业务的窗口内完成。
(2)设置lock_timeout,不让其长时间的获取锁,阻塞业务。
(3)可以使用create index concurrently 在线创建索引(CIC),降低了创建索引在表上申请的锁的级别,ShareUpdateExclusiveLock级别的锁和RowExclusiveLock不冲突,不会阻塞DML操作。
创建索引慢:
至于常见的创建索引慢的原因也可以参考我的这篇文章常见的创建索引慢的原因