一、 背景
前些天晚上突然收到业务反馈,查询DB中的一个表报错
Could not open file "pg-xact/005E": No such file or directory.
两眼一黑难道是文件损坏了...登录查看DB日志,还好没有其他报错,业务也反馈只有这一个表在从库查询报错,主库正常,于是开始分析处理。
二、 解决方法
根据搜索结果看,这个问题基本就两种处理方法,这里先列出来。
1. 数据库恢复
如果数据库小,或者对数据一致性要求高,建议的方法还是进行恢复,因为方法二实际对数据一致性是有损的。
2. 新建全0文件
报错是文件没了,所以解决方法是手动建回去(记得先确认权限)
#postgres用户执行
dd if=/dev/zero of=$PGDATA/pg_xact/005E bs=256k count=1
chmod 600 $PGDATA/pg_xact/005E
风险
clog存事务的最终状态信息,就是这个事务最终是提交了还是回滚了,建一个空的文件,相当于文件里的事务不知道它提交还是回滚了,理论上会出现数据不一致。
为什么这里建全0文件可以修复,从源码中可以看到,CLOG的日志段创建函数如下:
/*
* This func must be called ONCE on system install. It creates
* the initial CLOG segment. (The CLOG directory is assumed to
* have been created by initdb, and CLOGShmemInit must have been
* called already.)
*/
void
BootStrapCLOG(void)
{
int slotno;
LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
/* Create and zero the first page of the commit log */
slotno = ZeroCLOGPage(0, false);
/* Make sure it's written out */
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
LWLockRelease(XactSLRULock);
}
主要为两个函数:
- ZeroCLOGPage函数:将该可用缓冲区初始化为全0
- SimpleLruWritePage函数:会再调用SlruInternalWritePage函数将页由缓冲区写入磁盘
三、 CLOG是什么
1. 原理及作用
postgresql源码学习(51)—— 提交日志CLOG 原理 用途 管理函数-CSDN博客
2. CLOG如何删除
报错是clog没了,所以这里再学习一下CLOG的删除原理
- TruncateCLOG函数:调用WriteTruncateXlogRec删除过期的CLOG
- 是否过期根据当前数据库中最旧事务id oldestXact判断
/*
* Remove all CLOG segments before the one holding the passed transaction ID
*
* Before removing any CLOG data, we must flush XLOG to disk, to ensure
* that any recently-emitted FREEZE_PAGE records have reached disk; otherwise
* a crash and restart might leave us with some unfrozen tuples referencing
* removed CLOG data. We choose to emit a special TRUNCATE XLOG record too.
* Replaying the deletion from XLOG is not critical, since the files could
* just as well be removed later, but doing so prevents a long-running hot
* standby server from acquiring an unreasonably bloated CLOG directory.
*
* Since CLOG segments hold a large number of transactions, the opportunity to
* actually remove a segment is fairly rare, and so it seems best not to do
* the XLOG flush unless we have confirmed that there is a removable segment.
*/
void
TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
{
int cutoffPage;
/*
* The cutoff point is the start of the segment containing oldestXact. We
* pass the *page* containing oldestXact to SimpleLruTruncate.
*/
cutoffPage = TransactionIdToPage(oldestXact);
/* Check to see if there's any files that could be removed */
if (!SlruScanDirectory(XactCtl, SlruScanDirCbReportPresence, &cutoffPage))
return; /* nothing to remove */
/*
* Advance oldestClogXid before truncating clog, so concurrent xact status
* lookups can ensure they don't attempt to access truncated-away clog.
*
* It's only necessary to do this if we will actually truncate away clog
* pages.
*/
AdvanceOldestClogXid(oldestXact);
/*
* Write XLOG record and flush XLOG to disk. We record the oldest xid
* we're keeping information about here so we can ensure that it's always
* ahead of clog truncation in case we crash, and so a standby finds out
* the new valid xid before the next checkpoint.
*/
WriteTruncateXlogRec(cutoffPage, oldestXact, oldestxid_datoid);
/* Now we can remove the old CLOG segment(s) */
SimpleLruTruncate(XactCtl, cutoffPage);
}
- AdvanceOldestClogXid函数:对比 clog中的事务id 与 当前数据库最旧事务id oldestXact,如果clog中的更旧,将其更新为oldestXact,避免后续并发事务访问到已被清理的clog。
/*
* Advance the cluster-wide value for the oldest valid clog entry.
*
* We must acquire XactTruncationLock to advance the oldestClogXid. It's not
* necessary to hold the lock during the actual clog truncation, only when we
* advance the limit, as code looking up arbitrary xids is required to hold
* XactTruncationLock from when it tests oldestClogXid through to when it
* completes the clog lookup.
*/
void
AdvanceOldestClogXid(TransactionId oldest_datfrozenxid)
{
LWLockAcquire(XactTruncationLock, LW_EXCLUSIVE);
if (TransactionIdPrecedes(ShmemVariableCache->oldestClogXid,
oldest_datfrozenxid))
{
ShmemVariableCache->oldestClogXid = oldest_datfrozenxid;
}
LWLockRelease(XactTruncationLock);
}
四、 遗留问题
这个报错为什么会突然出现,目前还没查到相关资料
1. 现场情况
- 报错PG版本为10.5,从库查询报错,主库不报错
- 报错中要找的clog为005E,txid为 99185979
- 报错数据库最旧的clog为0269,并且已经是两个月前的
- 报错数据库当前txid为 9436969370,远远大于报错的
一个小补充:为什么查出来的txid是94亿多,远远大于了42亿,实际上txid的计算有一个转换,转完之后大概是8亿多。
select (txid_current() % (2^32)::bigint)::text::xid;
- 当时数据库中最老的表年龄大概是1.9亿,并且不是报错的这个表
2. 一些猜测
从现场情况来看,005E文件应该是PG正常删除,而非误删或文件系统损坏导致不可访问。
奇怪的是为什么查询该表会访问这么旧的文件,从事务年龄来看,这不应该是个正常的访问,比较怀疑还是遇到了bug。
参考
《PostgreSQL数据库内核分析》第7章
PostgresQL-丢失各种数据文件如何恢复 - binbinx - 博客园