PostgreSQL pg-xact（clog）目录文件缺失处理

一、背景

前些天晚上突然收到业务反馈，查询DB中的一个表报错

Could not open file "pg-xact/005E": No such file or directory.

两眼一黑难道是文件损坏了...登录查看DB日志，还好没有其他报错，业务也反馈只有这一个表在从库查询报错，主库正常，于是开始分析处理。

二、解决方法

根据搜索结果看，这个问题基本就两种处理方法，这里先列出来。

1. 数据库恢复

如果数据库小，或者对数据一致性要求高，建议的方法还是进行恢复，因为方法二实际对数据一致性是有损的。

2. 新建全0文件

报错是文件没了，所以解决方法是手动建回去（记得先确认权限）

#postgres用户执行
dd if=/dev/zero of=$PGDATA/pg_xact/005E bs=256k count=1
chmod 600 $PGDATA/pg_xact/005E

风险

clog存事务的最终状态信息，就是这个事务最终是提交了还是回滚了，建一个空的文件，相当于文件里的事务不知道它提交还是回滚了，理论上会出现数据不一致。

为什么这里建全0文件可以修复，从源码中可以看到，CLOG的日志段创建函数如下：

/*
 * This func must be called ONCE on system install.  It creates
 * the initial CLOG segment.  (The CLOG directory is assumed to
 * have been created by initdb, and CLOGShmemInit must have been
 * called already.)
 */
void
BootStrapCLOG(void)
{
	int			slotno;

	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);

	/* Create and zero the first page of the commit log */
	slotno = ZeroCLOGPage(0, false);

	/* Make sure it's written out */
	SimpleLruWritePage(XactCtl, slotno);
	Assert(!XactCtl->shared->page_dirty[slotno]);

	LWLockRelease(XactSLRULock);
}

主要为两个函数：

ZeroCLOGPage函数：将该可用缓冲区初始化为全0
SimpleLruWritePage函数：会再调用SlruInternalWritePage函数将页由缓冲区写入磁盘

三、 CLOG是什么

1. 原理及作用

postgresql源码学习（51）—— 提交日志CLOG 原理用途管理函数-CSDN博客

2. CLOG如何删除

报错是clog没了，所以这里再学习一下CLOG的删除原理

TruncateCLOG函数：调用WriteTruncateXlogRec删除过期的CLOG
是否过期根据当前数据库中最旧事务id oldestXact判断

/*
 * Remove all CLOG segments before the one holding the passed transaction ID
 *
 * Before removing any CLOG data, we must flush XLOG to disk, to ensure
 * that any recently-emitted FREEZE_PAGE records have reached disk; otherwise
 * a crash and restart might leave us with some unfrozen tuples referencing
 * removed CLOG data.  We choose to emit a special TRUNCATE XLOG record too.
 * Replaying the deletion from XLOG is not critical, since the files could
 * just as well be removed later, but doing so prevents a long-running hot
 * standby server from acquiring an unreasonably bloated CLOG directory.
 *
 * Since CLOG segments hold a large number of transactions, the opportunity to
 * actually remove a segment is fairly rare, and so it seems best not to do
 * the XLOG flush unless we have confirmed that there is a removable segment.
 */
void
TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
{
	int			cutoffPage;

	/*
	 * The cutoff point is the start of the segment containing oldestXact. We
	 * pass the *page* containing oldestXact to SimpleLruTruncate.
	 */
	cutoffPage = TransactionIdToPage(oldestXact);

	/* Check to see if there's any files that could be removed */
	if (!SlruScanDirectory(XactCtl, SlruScanDirCbReportPresence, &cutoffPage))
		return;					/* nothing to remove */

	/*
	 * Advance oldestClogXid before truncating clog, so concurrent xact status
	 * lookups can ensure they don't attempt to access truncated-away clog.
	 *
	 * It's only necessary to do this if we will actually truncate away clog
	 * pages.
	 */
	AdvanceOldestClogXid(oldestXact);

	/*
	 * Write XLOG record and flush XLOG to disk. We record the oldest xid
	 * we're keeping information about here so we can ensure that it's always
	 * ahead of clog truncation in case we crash, and so a standby finds out
	 * the new valid xid before the next checkpoint.
	 */
	WriteTruncateXlogRec(cutoffPage, oldestXact, oldestxid_datoid);

	/* Now we can remove the old CLOG segment(s) */
	SimpleLruTruncate(XactCtl, cutoffPage);
}

AdvanceOldestClogXid函数：对比 clog中的事务id 与当前数据库最旧事务id oldestXact，如果clog中的更旧，将其更新为oldestXact，避免后续并发事务访问到已被清理的clog。

/*
 * Advance the cluster-wide value for the oldest valid clog entry.
 *
 * We must acquire XactTruncationLock to advance the oldestClogXid. It's not
 * necessary to hold the lock during the actual clog truncation, only when we
 * advance the limit, as code looking up arbitrary xids is required to hold
 * XactTruncationLock from when it tests oldestClogXid through to when it
 * completes the clog lookup.
 */
void
AdvanceOldestClogXid(TransactionId oldest_datfrozenxid)
{
	LWLockAcquire(XactTruncationLock, LW_EXCLUSIVE);
	if (TransactionIdPrecedes(ShmemVariableCache->oldestClogXid,
							  oldest_datfrozenxid))
	{
		ShmemVariableCache->oldestClogXid = oldest_datfrozenxid;
	}
	LWLockRelease(XactTruncationLock);
}

四、遗留问题

这个报错为什么会突然出现，目前还没查到相关资料

1. 现场情况

报错PG版本为10.5，从库查询报错，主库不报错
报错中要找的clog为005E，txid为 99185979
报错数据库最旧的clog为0269，并且已经是两个月前的

报错数据库当前txid为 9436969370，远远大于报错的

一个小补充：为什么查出来的txid是94亿多，远远大于了42亿，实际上txid的计算有一个转换，转完之后大概是8亿多。

select (txid_current() % (2^32)::bigint)::text::xid;

当时数据库中最老的表年龄大概是1.9亿，并且不是报错的这个表

2. 一些猜测

从现场情况来看，005E文件应该是PG正常删除，而非误删或文件系统损坏导致不可访问。

奇怪的是为什么查询该表会访问这么旧的文件，从事务年龄来看，这不应该是个正常的访问，比较怀疑还是遇到了bug。

参考

《PostgreSQL数据库内核分析》第7章

PostgresQL-丢失各种数据文件如何恢复 - binbinx - 博客园

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：/a/913083.html

如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈qq邮箱809451989@qq.com，一经查实，立即删除！