前一篇文章提到了普通创建索引会阻塞DML操作
PostgreSQL创建索引的锁分析和使用注意
而PostgreSQL里可以使用create index concurrently 在线创建索引(CIC)功能,降低创建索引在表上申请的锁的级别,ShareUpdateExclusiveLock级别的锁和RowExclusiveLock不冲突,不会阻塞表上的DML操作。
1.1 在线创建索引(CIC)的原理
并发创建索引需要分多个步骤完成,首先在一个事务中将相关索引信息记录到系统表中,但是将索引信息标记为非法状态,然后需要进行两次扫描,并且在最后需要等待第二次扫描之前产生的所有具有快照信息的事务结束,最后修改索引的状态信息为可用。
1.插入元数据
|在系统表中插入索引的元数据,包括pg_class、pg_index,索引信息标记为非法状态(INVALID),然后开启两个事务,进行两次扫描
2.第一次扫描
|开启事务1,拿到当前snapshot1
|扫描test_tab1表前,等待所有修改过test1表(写入、删除、更新)的事务结束
|扫描test_tab1表,并建立中间状态的索引(INVALID)
|结束事务1
2.第二次扫描
|开启事务2,拿到当前snapshot2
|再次扫描test_tab1表前,等待所有修改过test_tab1表(写入、删除、更新)的事务结束
|在snapshot2之后启动的事务对test_tab1表执行的DML,会修改这个idx_1的索引
|再次扫描test_tab1表,更新索引。(从tuple中可以拿到版本号,在snapshot1到snapshot2之间变更的记录,将其合并到索引)
|上一步更新索引结束后,等待事务2之前开启的持有snapshot的事务结束
|结束索引创建,索引可见
可以看到CREATE INDEX CONCURRENTLY在线创建索引(CIC)是需要借助snapshot去完成操作的,所以其实如果有长事务占用了快照,让它获取不到锁,那么创建的时间就会很长。
1.2 在线创建索引(CIC)操作在表上获取的锁(ShareUpdateExclusiveLock)
如之前测试的现象,在一张表上创建普通B-tree索引的时候,会阻塞这张表上进行的DML操作。PostgreSQL支持在线创建索引(CREATE INDEX CONCURRENTLY),不堵塞其他会话对被创建索引表的DML(INSERT,UPDATE,DELETE)操作。
postgres=# begin;
BEGIN
postgres=*# create index concurrently idx_111 on t1(id);
ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
postgres=!#
而create index concurrently操作不能在一个显式开启的事务里执行,并且我自己的环境也比较有限,就不造数据模拟了,而是使用gdb在对应函数打上Breakpoint,进行分析。
(gdb) b LockAcquireExtended
Breakpoint 1 at 0xaaaab094d094: file lock.c, line 765.
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y 0x0000aaaab094d094 in LockAcquireExtended at lock.c:765
(gdb) c
Continuing.
Breakpoint 1, LockAcquireExtended (locktag=locktag@entry=0xffffc42c3f08, lockmode=lockmode@entry=1, sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false, reportMemoryError=reportMemoryError@entry=true, locallockp=locallockp@entry=0xffffc42c3f00) at lock.c:765
765 {
(gdb) bt
#0 LockAcquireExtended (locktag=locktag@entry=0xffffc42c3f08, lockmode=lockmode@entry=1, sessionLock=sessionLock@entry=false,
dontWait=dontWait@entry=false, reportMemoryError=reportMemoryError@entry=true, locallockp=locallockp@entry=0xffffc42c3f00) at lock.c:765
#1 0x0000aaaab0949ffc in LockRelationOid (relid=3466, lockmode=1) at lmgr.c:117
#2 0x0000aaaab058c1a8 in relation_open (relationId=relationId@entry=3466, lockmode=lockmode@entry=1) at relation.c:56
#3 0x0000aaaab0aa3fb4 in BuildEventTriggerCache () at evtcache.c:130
#4 EventCacheLookup (event=<optimized out>, event@entry=EVT_SQLDrop) at evtcache.c:69
#5 0x0000aaaab07018c0 in trackDroppedObjectsNeeded () at event_trigger.c:1147
#6 EventTriggerBeginCompleteQuery () at event_trigger.c:1089
#7 0x0000aaaab096cd48 in ProcessUtilitySlow (pstate=pstate@entry=0xaaaac6004f58, pstmt=pstmt@entry=0xaaaac60827d0,
queryString=queryString@entry=0xaaaac6081b98 "create index concurrently idx_1 on tab_test_1(id);", context=context@entry=PROCESS_UTILITY_TOPLEVEL,
params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, qc=qc@entry=0xffffc42c4ab8, dest=<optimized out>) at utility.c:1118
#8 0x0000aaaab096c0d4 in standard_ProcessUtility (pstmt=0xaaaac60827d0, queryString=0xaaaac6081b98 "create index concurrently idx_1 on tab_test_1(id);",
readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0xaaaac6083000, qc=0xffffc42c4ab8) at utility.c:1078
#9 0x0000ffff90026270 in pgss_ProcessUtility (pstmt=0xaaaac60827d0, queryString=0xaaaac6081b98 "create index concurrently idx_1 on tab_test_1(id);",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0xaaaac6083000, qc=0xffffc42c4ab8) at pg_stat_statements.c:1145
#10 0x0000aaaab096a6cc in PortalRunUtility (portal=portal@entry=0xaaaac6104e88, pstmt=pstmt@entry=0xaaaac60827d0, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0xaaaac6083000, qc=qc@entry=0xffffc42c4ab8) at pquery.c:1158
#11 0x0000aaaab096a874 in PortalRunMulti (portal=portal@entry=0xaaaac6104e88, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0xaaaac6083000, altdest=altdest@entry=0xaaaac6083000, qc=qc@entry=0xffffc42c4ab8)
at pquery.c:1315
#12 0x0000aaaab096ae00 in PortalRun (portal=portal@entry=0xaaaac6104e88, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0xaaaac6083000, altdest=altdest@entry=0xaaaac6083000, qc=qc@entry=0xffffc42c4ab8) at pquery.c:791
#13 0x0000aaaab0966768 in exec_simple_query (query_string=query_string@entry=0xaaaac6081b98 "create index concurrently idx_1 on tab_test_1(id);")
at postgres.c:1274
#14 0x0000aaaab0967648 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#15 0x0000aaaab08c0514 in BackendRun (port=0xaaaac60b6c40, port=0xaaaac60b6c40) at postmaster.c:4464
#16 BackendStartup (port=0xaaaac60b6c40) at postmaster.c:4192
#17 ServerLoop () at postmaster.c:1782
#18 0x0000aaaab08c165c in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0xaaaac5fe9d40) at postmaster.c:1466
#19 0x0000aaaab0578464 in main (argc=1, argv=0xaaaac5fe9d40) at main.c:198
上面停掉的第一个Breakpoint,其可以看一下调用到LockAcquireExtended的堆栈,relation_open (relationId=relationId@entry=3466, lockmode=lockmode@entry=1) at relation.c:56 这里oid对应是3466的并不是索引所在的表,而是系统表,因为创建索引过程也会访问系统表,开始连续几个停的断点的位置,打印的lockmode都是1,获取的都是AccessShareLock,也就是说这几个系统表都是进行了select的操作。这里我们就带着目的去看,继续让他往下跑,一直等到这个lockmode为4的时候,relation_open (relationId=relationId@entry=16725,因为我们创建索引的这张表tab_test_1,它对应的oid是16725。lockmode为4对应的就是ShareUpdateExclusiveLock。
postgres=# select oid,relname from pg_class where relname='tab_test_1';
oid | relname
-------+------------
16725 | tab_test_1
(1 row)
765 {
(gdb) c
Continuing.
Breakpoint 1, LockAcquireExtended (locktag=locktag@entry=0xffffc42c3d78, lockmode=lockmode@entry=1, sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false, reportMemoryError=reportMemoryError@entry=true, locallockp=locallockp@entry=0xffffc42c3d70) at lock.c:765
765 {
(gdb) c
Continuing.
Breakpoint 1, LockAcquireExtended (locktag=locktag@entry=0xffffc42c3fd8, lockmode=lockmode@entry=4, sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false, reportMemoryError=reportMemoryError@entry=true, locallockp=locallockp@entry=0xffffc42c3fd0) at lock.c:765
765 {
(gdb) bt
#0 LockAcquireExtended (locktag=locktag@entry=0xffffc42c3fd8, lockmode=lockmode@entry=4, sessionLock=sessionLock@entry=false,
dontWait=dontWait@entry=false, reportMemoryError=reportMemoryError@entry=true, locallockp=locallockp@entry=0xffffc42c3fd0) at lock.c:765
#1 0x0000aaaab0949ffc in LockRelationOid (relid=relid@entry=16725, lockmode=lockmode@entry=4) at lmgr.c:117
#2 0x0000aaaab0672290 in RangeVarGetRelidExtended (relation=0xaaaac60825b8, lockmode=lockmode@entry=4, flags=flags@entry=0,
callback=0xaaaab074e5d0 <RangeVarCallbackOwnsRelation>, callback_arg=callback_arg@entry=0x0) at namespace.c:390
#3 0x0000aaaab096d1f8 in ProcessUtilitySlow (pstate=pstate@entry=0xaaaac61ba098, pstmt=pstmt@entry=0xaaaac60827d0,
queryString=queryString@entry=0xaaaac6081b98 "create index concurrently idx_1 on tab_test_1(id);", context=context@entry=PROCESS_UTILITY_TOPLEVEL,
params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, qc=qc@entry=0xffffc42c4ab8, dest=<optimized out>) at utility.c:1486
#4 0x0000aaaab096c0d4 in standard_ProcessUtility (pstmt=0xaaaac60827d0, queryString=0xaaaac6081b98 "create index concurrently idx_1 on tab_test_1(id);",
readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0xaaaac6083000, qc=0xffffc42c4ab8) at utility.c:1078
#5 0x0000ffff90026270 in pgss_ProcessUtility (pstmt=0xaaaac60827d0, queryString=0xaaaac6081b98 "create index concurrently idx_1 on tab_test_1(id);",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0xaaaac6083000, qc=0xffffc42c4ab8) at pg_stat_statements.c:1145
#6 0x0000aaaab096a6cc in PortalRunUtility (portal=portal@entry=0xaaaac6104e88, pstmt=pstmt@entry=0xaaaac60827d0, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0xaaaac6083000, qc=qc@entry=0xffffc42c4ab8) at pquery.c:1158
#7 0x0000aaaab096a874 in PortalRunMulti (portal=portal@entry=0xaaaac6104e88, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0xaaaac6083000, altdest=altdest@entry=0xaaaac6083000, qc=qc@entry=0xffffc42c4ab8)
at pquery.c:1315
#8 0x0000aaaab096ae00 in PortalRun (portal=portal@entry=0xaaaac6104e88, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0xaaaac6083000, altdest=altdest@entry=0xaaaac6083000, qc=qc@entry=0xffffc42c4ab8) at pquery.c:791
#9 0x0000aaaab0966768 in exec_simple_query (query_string=query_string@entry=0xaaaac6081b98 "create index concurrently idx_1 on tab_test_1(id);")
at postgres.c:1274
#10 0x0000aaaab0967648 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#11 0x0000aaaab08c0514 in BackendRun (port=0xaaaac60b73e0, port=0xaaaac60b73e0) at postmaster.c:4464
#12 BackendStartup (port=0xaaaac60b73e0) at postmaster.c:4192
#13 ServerLoop () at postmaster.c:1782
#14 0x0000aaaab08c165c in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0xaaaac5fe9d40) at postmaster.c:1466
#15 0x0000aaaab0578464 in main (argc=1, argv=0xaaaac5fe9d40) at main.c:198
s单步执行,然后p打印出变量的值
上面 *locktag里locktag_type是0,结合LockTagType的定义,第一个定义的enum值默认为0,后续的值在前一个定义值的基础上加1,可以知道是申请的LOCKTAG_RELATION,表锁。
/*
* The LOCKTAG struct is defined with malice aforethought to fit into 16
* bytes with no padding. Note that this would need adjustment if we were
* to widen Oid, BlockNumber, or TransactionId to more than 32 bits.
*
* We include lockmethodid in the locktag so that a single hash table in
* shared memory can store locks of different lockmethods.
*/
typedef struct LOCKTAG
{
uint32 locktag_field1; /* a 32-bit ID field */
uint32 locktag_field2; /* a 32-bit ID field */
uint32 locktag_field3; /* a 32-bit ID field */
uint16 locktag_field4; /* a 16-bit ID field */
uint8 locktag_type; /* see enum LockTagType */
uint8 locktag_lockmethodid; /* lockmethod indicator */
} LOCKTAG;
/*
* LOCKTAG is the key information needed to look up a LOCK item in the
* lock hashtable. A LOCKTAG value uniquely identifies a lockable object.
*
* The LockTagType enum defines the different kinds of objects we can lock.
* We can handle up to 256 different LockTagTypes.
*/
typedef enum LockTagType
{
LOCKTAG_RELATION, /* whole relation */
LOCKTAG_RELATION_EXTEND, /* the right to extend a relation */
LOCKTAG_DATABASE_FROZEN_IDS, /* pg_database.datfrozenxid */
LOCKTAG_PAGE, /* one page of a relation */
LOCKTAG_TUPLE, /* one physical tuple */
LOCKTAG_TRANSACTION, /* transaction (for waiting for xact done) */
LOCKTAG_VIRTUALTRANSACTION, /* virtual transaction (ditto) */
LOCKTAG_SPECULATIVE_TOKEN, /* speculative insertion Xid and token */
LOCKTAG_OBJECT, /* non-relation database object */
LOCKTAG_USERLOCK, /* reserved for old contrib/userlock code */
LOCKTAG_ADVISORY /* advisory user locks */
} LockTagType;
1.3 在线创建索引(CIC)被阻塞的案例(等待vxid)
//session 1:pid 557584
postgres=# begin;
BEGIN
postgres=*# select id from tab_test_2 for update;
id
----
1
(1 row)
//session 2:pid 557784
postgres=# select id from tab_test_2 for update;
被阻塞
//session 3:pid 558431
postgres=# create index concurrently idx_1 on tab_test_1(id);
被阻塞
//session 4:
(1)可以使用pg_blocking_pids查看被谁阻塞了
postgres=# select pg_blocking_pids('558431');
pg_blocking_pids
------------------
{557784}
(1 row)
postgres=# select pg_blocking_pids('557784');
pg_blocking_pids
------------------
{557584}
(1 row)
(2)也可以像下边查询锁的情况
postgres=# select* from pg_locks
where pid in('557584','557784','558431');
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mod
e | granted | fastpath | waitstart
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+--------------
------------+---------+----------+-------------------------------
virtualxid | | | | | 5/309 | | | | | 5/309 | 558431 | ExclusiveLock
| t | t |
relation | 13008 | 24929 | | | | | | | | 4/46 | 557784 | RowShareLock
| t | t |
relation | 13008 | 24929 | | | | | | | | 3/202 | 557584 | RowShareLock
| t | t |
virtualxid | | | | | 3/202 | | | | | 3/202 | 557584 | ExclusiveLock
| t | t |
transactionid | | | | | | 1719 | | | | 3/202 | 557584 | ExclusiveLock
| t | f |
virtualxid | | | | | 4/46 | | | | | 5/309 | 558431 | ShareLock
| f | f | 2024-01-05 13:37:45.5214+08
relation | 13008 | 24925 | | | | | | | | 5/309 | 558431 | ShareUpdateEx
clusiveLock | t | f |
tuple | 13008 | 24929 | 0 | 1 | | | | | | 4/46 | 557784 | AccessExclusi
veLock | t | f |
transactionid | | | | | | 1719 | | | | 4/46 | 557784 | ShareLock
| f | f | 2024-01-05 13:37:42.613288+08
virtualxid | | | | | 4/46 | | | | | 4/46 | 557784 | ExclusiveLock
| t | f |
(10 rows)
"virtualxid"和"virtualtransaction"从字面上看都是虚拟事务ID的意思。它们的区别是:位于pg_locks视图的不同部分,"virtualxid"位于描述锁对象的部分, "virtualtransaction"位于描述持有锁或等待锁的部分。因此,"virtualxid"表示这个锁对象是一个虚拟事务,而"virtualtransaction"表示持有锁或等待锁的虚拟事务ID。
通过上图可以看出, 5/309是建索引本身的vxid,建索引需要等老事务结束,所以用vxid等另外一个会话结束,可以看到最后一行在请求别人的vxid 4/46。而这个vxid 4/46刚好是pid为557584的会话持有的,所以这个阻塞不是等待获取表上的锁,是在等待vxid的锁。
使用pstack 看一下现在被阻塞的这个会话的堆栈
root@ubuntu-linux-22-04-desktop:~# pstack 558431
#0 0x0000ffff8faf5ea8 in epoll_pwait () from /lib/aarch64-linux-gnu/libc.so.6
#1 0x0000aaaae03e778c in WaitEventSetWait ()
#2 0x0000aaaae03e7b30 in WaitLatch ()
#3 0x0000aaaae040b428 in ProcSleep ()
#4 0x0000aaaae03fc064 in WaitOnLock ()
#5 0x0000aaaae03fd410 in LockAcquireExtended ()
#6 0x0000aaaae0400f54 in VirtualXactLock ()
#7 0x0000aaaae01c93b4 in WaitForOlderSnapshots ()
#8 0x0000aaaae01cd930 in DefineIndex ()
#9 0x0000aaaae041d270 in ProcessUtilitySlow.constprop.0 ()
#10 0x0000aaaae041c0d4 in standard_ProcessUtility ()
#11 0x0000ffff8f426270 in pgss_ProcessUtility () from /home/postgres/soft-16/lib/pg_stat_statements.so
#12 0x0000aaaae041a6cc in PortalRunUtility ()
#13 0x0000aaaae041a874 in PortalRunMulti ()
#14 0x0000aaaae041ae00 in PortalRun ()
#15 0x0000aaaae0416768 in exec_simple_query ()
#16 0x0000aaaae0417648 in PostgresMain ()
#17 0x0000aaaae0370514 in ServerLoop ()
#18 0x0000aaaae037165c in PostmasterMain ()
#19 0x0000aaaae0028464 in main ()
DefineIndex()主要是处理索引创建的逻辑,而常规锁的申请主要在接口 LockAcquire() 和 LockAcquireExtended()中实现。可以看到堆栈的最后处于等待的状态。
可以看一下其中的变量,这个relationId=24925对应的对象就是我们要创建的索引所在的表,而对应的locktag_type = 6,结合LockTagType的定义,可以知道这个6代表的是LOCKTAG_VIRTUALTRANSACTION,对virtual transaction申请锁,锁是5级,表示ShareLock,看到是不是有疑问了,之前说CIC的锁相对于普通的建立索引降低了一个级别,变成了4级锁,但是这里是5级锁,其实是不一样的,这块获取的锁的locktag_type并不是针对relation的,而是针对virtual transaction的,源码注释里关于常规锁模式解析部分的也是针对于relation的。
postgres=# select '24925'::regclass;
regclass
------------
tab_test_1
(1 row)
所以真正CIC过程表上的锁,应该找到一个relation oid=24925,LockAcquireExtended()函数执行过程locktag->locktag_type为0时,再查看lockmode的值。这里再次重新用gdb打上一个breakpoint,抓到了表上申请的锁,果然是4级锁,即ShareUpdateExclusiveLock。
1.4 在线创建索引(CIC)操作的可能遇到的问题
PostgreSQL支持在线创建索引(CREATE INDEX CONCURRENTLY),不堵塞其他会话对被创建索引表的DML(INSERT,UPDATE,DELETE)操作,所以有时候为了不阻塞业务,采用CIC,而不是使用普通创建索引的方式。
问题:
1.执行速度慢
从在线创建索引(CIC)的实现机制上可以看出,它需要两次扫描表。所以不考虑锁阻塞的情况下它的执行时间可能会比正常创建索引慢很多。
2.执行失败后可能存在INVALID索引
因为第一次扫描并建立中间状态的索引(INVALID)后,索引实际上就对后面的DML起作用了,所以如果是在第二SCAN阶段,索引创建失败了,这个索引会一直影响DML(性能、约束)。
3.冲突,不允许同时执行
create index concurrently 在线创建索引(CIC)功能无法并发执行,因为这个操作在表上上的是ShareUpdateExclusiveLock锁,四级锁,自斥。
解决方法:
第一个问题:
扫描两次表的这个问题,是其本身机制的问题,只能尽量选取在业务不忙的时候,除此之外,尽量避免阻塞情况:
- 尽量避免创建索引过程中,两次SCAN之前对被创建索引表实施长事务,并且长事务中包含修改被创建索引的表。
- 在第二次SCAN前,尽量避免开启长事务。
第二个问题:
因为CIC的实现机制问题,如果索引创建失败后,可能会留下一个失效索引。失效索引不能被使用,而且在进行DML的时候,也会一并进行更改,浪费主机资源。可以查看pg_index视图的indisvalid字段。如果为真,此索引当前可以用于查询,为假表示此索引可能不完整,需要进行处理,根据需求重建或者删除掉。
第三个问题:
注意不要多个session同时对一张表做create index concurrently操作,以防冲突。