postgreSQL中的高速缓存

1. 高速缓存简介

如下图所示，当一个postgreSQL进程读取一个元组时，需要获取表的基本信息（例如：表的oid、索引信息和统计信息等）及元组的模式信息，这些信息被分别记录在多个系统表中。通常一个表的模式信息在设定好后的变化频率很低，因此在对同一个表的多个元组操作时，每次都去读取系统表的元组来构建模式信息显然是没有必要的，这也会降低元组的操作效率。为了减少对系统表的访问，在每个进程的本地内存区域设置了两种cache，一种是用来存储系统表的元组，一种是用来存储表的基本信息，从而可以让进程更快的构建出表的基本信息和元组的模式信息。cache在某一个进程对系统表发生更改时其他的 backend 进程要能够感知到，需要有一套维护cache 一致性的机制，也就是 PG 的 InvalidMessage机制。

用户表是如何被管理的，参考：https://zhuanlan.zhihu.com/p/623283855

在这里插入图片描述

2. SysCache

syscache主要用来缓存最近使用过的系统表的元组。从代码实现看，syscache就是一个catcache数组，数组的长度为系统表的个数，每一个系统表唯一的对应catcache数组的一个元素。

catcache数据结构

typedef struct catcache
{
	int			id;				/* catcache id */
	int			cc_nbuckets;	/* # of hash buckets in this cache */
	TupleDesc	cc_tupdesc;		/* tuple descriptor (copied from reldesc) */
	dlist_head *cc_bucket;		/* hash buckets */
	CCHashFN	cc_hashfunc[CATCACHE_MAXKEYS];	/* hash function for each key */
	CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS];	/* fast equal function for
													 * each key */
	int			cc_keyno[CATCACHE_MAXKEYS]; /* AttrNumber of each key */
	dlist_head	cc_lists;		/* list of CatCList structs */
	int			cc_ntup;		/* # of tuples currently in this cache */
	int			cc_nkeys;		/* # of keys (1..CATCACHE_MAXKEYS) */
	const char *cc_relname;		/* name of relation the tuples come from */
	Oid			cc_reloid;		/* OID of relation the tuples come from */
	Oid			cc_indexoid;	/* OID of index matching cache keys */
	bool		cc_relisshared; /* is relation shared across databases? */
	slist_node	cc_next;		/* list link */
	ScanKeyData cc_skey[CATCACHE_MAXKEYS];	/* precomputed key info for heap
											 * scans */

	/*
	 * Keep these at the end, so that compiling catcache.c with CATCACHE_STATS
	 * doesn't break ABI for other modules
	 */
#ifdef CATCACHE_STATS
	long		cc_searches;	/* total # searches against this cache */
	long		cc_hits;		/* # of matches against existing entry */
	long		cc_neg_hits;	/* # of matches against negative entry */
	long		cc_newloads;	/* # of successful loads of new entry */

	/*
	 * cc_searches - (cc_hits + cc_neg_hits + cc_newloads) is number of failed
	 * searches, each of which will result in loading a negative entry
	 */
	long		cc_invals;		/* # of entries invalidated from cache */
	long		cc_lsearches;	/* total # list-searches */
	long		cc_lhits;		/* # of matches against existing lists */
#endif
} CatCache;

2.1 syscache初始化

在对postgres进程初始化时，会对syscache进行初始化，将查找系统表元组的关键信息写入到catcache数组的元素中。

涉及到的数据结构如下：

cacheinfo：存储所有系统表的catcache描述信息

struct cachedesc
{
	Oid			reloid;			/* OID of the relation being cached */
	Oid			indoid;			/* OID of index relation for this cache */
	int			reloidattr;		/* attr number of rel OID reference, or 0 */
	int			nkeys;			/* # of keys needed for cache lookup */
	int			key[4];			/* attribute numbers of key attrs */
	int			nbuckets;		/* number of hash buckets for this cache */
};

static const struct cachedesc cacheinfo[] = {
	{AggregateRelationId,		/* AGGFNOID */
		AggregateFnoidIndexId,
		1,
		{
			Anum_pg_aggregate_aggfnoid,
			0,
			0,
			0
		},
		16
	},
	...
	
}

catcacheheader：catcache使用cc_next字段构成一个单向链表，头部使用此结构体记录

typedef struct catcacheheader
{
	slist_head	ch_caches;		/* head of list of CatCache structs */
	int			ch_ntup;		/* # of tuples in all caches */
} CatCacheHeader;

初始化阶段1：使用cacheinfo初始化catcache数组

typedef struct catcache
{
	...
	TupleDesc	cc_tupdesc;		/* tuple descriptor (copied from reldesc) */
	int			cc_nbuckets;	/* # of hash buckets in this cache */
	dlist_head *cc_bucket;		/* hash buckets */
	int			cc_keyno[CATCACHE_MAXKEYS]; /* AttrNumber of each key */
	int			cc_nkeys;		/* # of keys (1..CATCACHE_MAXKEYS) */
	Oid			cc_reloid;		/* OID of relation the tuples come from */
	Oid			cc_indexoid;	/* OID of index matching cache keys */
	...
}

初始化阶段2：根据对应的系统表填充catcache中元组描述信息（cc_tupdesc）、系统表名（cc_relname）和查找关键字的相关字段

typedef struct catcache
{
	...
	CCHashFN	cc_hashfunc[CATCACHE_MAXKEYS];	/* hash function for each key */
	CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS];	/* fast equal function for each key */
	const char *cc_relname;		/* name of relation the tuples come from */
	bool		cc_relisshared; /* is relation shared across databases? */
	ScanKeyData cc_skey[CATCACHE_MAXKEYS];	/* precomputed key info for heap scans */
	...
}

2.2 catcache中缓存元组的组织

每个catcache元素中cc_bucket数组是一个Hash桶数组，元组的键值可以通过hash函数映射到cc_bucket数组的下标。每个hash桶都被组织成一个双向链表Dllist，其中的节点为Dlelem类型，Dlelem是一个包装过的缓存元组，其dle_val字段指向一个CatCTup形式的缓存元组。
在这里插入图片描述

CatCache中的缓存元组将先包装成CatCTup形式，然后再包装成Dlelem形式，最后加入到其所在的hash桶链表中。

typedef struct dlist_node dlist_node;
struct dlist_node
{
	dlist_node *prev;
	dlist_node *next;
};

typedef struct catctup
{
	int			ct_magic;		/* for identifying CatCTup entries */
#define CT_MAGIC   0x57261502
	uint32		hash_value;		/* hash value for this tuple's keys */
	Datum		keys[CATCACHE_MAXKEYS];
	dlist_node	cache_elem;		/* list member of per-bucket list */
	int			refcount;		/* number of active references */
	bool		dead;			/* dead but not yet removed? 标记删除*/
	bool		negative;		/* negative cache entry? 表示实际并不存在的元组*/
	HeapTupleData tuple;		/* tuple management header */
	struct catclist *c_list;	/* containing CatCList, or NULL if none */

	CatCache   *my_cache;		/* link to owning catcache */
} CatCTup;

2.3 在catcache中查找元组

在catcache查找元组有两种方式：精确匹配和部分匹配。

精确匹配

精确匹配由SearchCatCache函数实现：

HeapTuple
SearchCatCache(CatCache *cache,
			   Datum v1,
			   Datum v2,
			   Datum v3,
			   Datum v4);

首先遍历catcacheheader链表，根据系统表名称或者oid查找到系统表对应的catcache元素。
查找元组键值进行hash，根据hash值找到catcache在cc_bucket数组中对应的hash桶下标。
遍历hash桶链表，找到满足需求的Dlelem，并将其结构体中dle_val强制转换为CatCTup类型，CatCTup中的HeapTupleData就是要查找的元组的头部。
将该Dlelem移动到hash桶链表的头部，并将catcache的cc_hits加1。
如果在hash桶链表中没有找到满足条件的元组，需要进一步扫描物理系统表：
- 如果在物理系统表中查找到元组，将元组包装成Dlelem，添加到hash桶链表的头部；
- 否则，说明元组不存在，构建一个“负元组”，并将它包装好，添加到hash桶链表的头部。

2. 部分匹配

部分匹配由SearchCatCacheList实现：

SearchCatCacheList(CatCache *cache,
				   int nkeys,
				   Datum v1,
				   Datum v2,
				   Datum v3)

该函数返回一个CatCList数据结构，返回的所有结果通过链表的方式管理。

typedef struct catclist
{
	int			cl_magic;		/* for identifying CatCList entries */
#define CL_MAGIC   0x52765103

	uint32		hash_value;		/* hash value for lookup keys */

	dlist_node	cache_elem;		/* list member of per-catcache list */

	/*
	 * Lookup keys for the entry, with the first nkeys elements being valid.
	 * All by-reference are separately allocated.
	 */
	Datum		keys[CATCACHE_MAXKEYS];

	int			refcount;		/* number of active references */
	bool		dead;			/* dead but not yet removed? */
	bool		ordered;		/* members listed in index order? */
	short		nkeys;			/* number of lookup keys specified */
	int			n_members;		/* number of member tuples */
	CatCache   *my_cache;		/* link to owning catcache */
	CatCTup    *members[FLEXIBLE_ARRAY_MEMBER]; /* members */
} CatCList;

查找过程：
在这里插入图片描述

3. RelCache

RelCache存放的不是元组，而是RelationData数据，每一个RelationData结构表示一个表的模式信息，这些信息由系统表元组中的信息构造而来。

typedef struct RelationData
{
	RelFileNode rd_node;		/* relation physical identifier */
	struct SMgrRelationData *rd_smgr;	/* 表的文件句柄 */
。	...
	Form_pg_class rd_rel;		/* 表在pg_class系统表中对应的元组里的信息 */
	TupleDesc	rd_att;			/* 表的元组描述符，描述了表的各个属性 */
	Oid			rd_id;			/* relation's object id */
	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
	...
	Form_pg_index rd_index;		/* pg_index tuple describing this index */
	...
} RelationData;

由于RelationData数据结构是不变的，采用了hash表维持这个结构。这个hash表也是 PG 内部应用最多最广的 hash 数据结构，其性能和稳定性在PostgreSQL 近三十年的生涯中历经磨练。这个 hash表的实现也是非常值得学习的工业级数据结。

动态hash表介绍参考：https://zhmin.github.io/posts/postgresql-dynamic-hash/

3.1 relcache初始化

初始化阶段1：调用RelationCacheInitialize函数进行初始化，创建hash表。

初始化阶段2：将必要的系统表和系统表索引的模式加入到RelCache中，包括pg_class、pg_attribute、pg_proc、pg_type。

3.2 relcache的操作

插入新打开的表

当打开新表时，需要把RelationData加入到RelCache中，该操作通过宏RelationCacheInsert来实现。

查找hash表

查找hash表通过宏定义RelationIdCacheLookup来实现，调用函数hash_search。

relation_open
	RelationIdGetRelation
		RelationIdCacheLookup(relationId, rd);
		RelationBuildDesc -- no reldesc in the cache, RelationBuildDesc() build one and add it.
			RelationBuildTupleDesc
			....
			RelationCacheInsert(relation);