1 前言
- 本文件的源码来自PostgreSQL 14.5,其它版本略有不同
- 并行workers并不能显箸提升性能。个人不建议使用并行worker进程,大多数情况下采用postgresql.conf默认配置即可。
PostgreSQL的并行workers是由compute_parallel_worker函数决定的,compute_parallel_worker是估算扫描所需的并行工作线程数,并不是您在postgresql.conf中设置的max_parallel_workers_per_gather数量,compute_parallel_worker会根据heap_pages、index_pages、max_workers(max_parallel_workers_per_gather)来决定并行工作线程数量。
2 源码和调用位置
compute_parallel_worker共有4个地方调用
src\backend\optimizer\path\allpaths.c(801,21)
src\backend\optimizer\path\allpaths.c(3724,21)
src\backend\optimizer\path\costsize.c(707,33)
src\backend\optimizer\plan\planner.c(5953,21)
compute_parallel_worker的声明
src\include\optimizer\paths.h(59,12)
compute_parallel_worker的实现
src\backend\optimizer\path\allpaths.c(3750,1)
compute_parallel_worker的源码
/*
* Compute the number of parallel workers that should be used to scan a
* relation. We compute the parallel workers based on the size of the heap to
* be scanned and the size of the index to be scanned, then choose a minimum
* of those.
*
* "heap_pages" is the number of pages from the table that we expect to scan, or
* -1 if we don't expect to scan any.
*
* "index_pages" is the number of pages from the index that we expect to scan, or
* -1 if we don't expect to scan any.
*
* "max_workers" is caller's limit on the number of workers. This typically
* comes from a GUC.
* "max_workers"就是postgresql.conf中max_parallel_workers_per_gather的值
*/
int
compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
int max_workers)
{
int parallel_workers = 0;
/*
* If the user has set the parallel_workers reloption, use that; otherwise
* select a default number of workers.
* 不需要优化,直接来自表级存储参数parallel_workers
* 详见第3节直接使用postgresql.conf中设置的max_parallel_workers_per_gather数量
*/
if (rel->rel_parallel_workers != -1)
parallel_workers = rel->rel_parallel_workers;
else
{
/*
* If the number of pages being scanned is insufficient to justify a
* parallel scan, just return zero ... unless it's an inheritance
* child. In that case, we want to generate a parallel path here
* anyway. It might not be worthwhile just for this relation, but
* when combined with all of its inheritance siblings it may well pay
* off.
*/
if (rel->reloptkind == RELOPT_BASEREL &&
((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
(index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
return 0;
if (heap_pages >= 0)
{
int heap_parallel_threshold;
int heap_parallel_workers = 1;
/*
* Select the number of workers based on the log of the size of
* the relation. This probably needs to be a good deal more
* sophisticated, but we need something here for now. Note that
* the upper limit of the min_parallel_table_scan_size GUC is
* chosen to prevent overflow here.
*/
heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
{
heap_parallel_workers++;
heap_parallel_threshold *= 3;
if (heap_parallel_threshold > INT_MAX / 3)
break; /* avoid overflow */
}
parallel_workers = heap_parallel_workers;
}
if (index_pages >= 0)
{
int index_parallel_workers = 1;
int index_parallel_threshold;
/* same calculation as for heap_pages above */
index_parallel_threshold = Max(min_parallel_index_scan_size, 1);
while (index_pages >= (BlockNumber) (index_parallel_threshold * 3))
{
index_parallel_workers++;
index_parallel_threshold *= 3;
if (index_parallel_threshold > INT_MAX / 3)
break; /* avoid overflow */
}
if (parallel_workers > 0)
parallel_workers = Min(parallel_workers, index_parallel_workers);
else
parallel_workers = index_parallel_workers;
}
}
/* In no case use more than caller supplied maximum number of workers */
parallel_workers = Min(parallel_workers, max_workers);
return parallel_workers;
}
3 直接使用postgresql.conf中设置的max_parallel_workers_per_gather数量
如果要使用指定数量的并行worker数,必须使用存储参数parallel_workers。
alter table tab set (parallel_workers=8);
配置了存储参数后,compute_parallel_worker就不在对并行worker数优化,直接返回配置的parallel_workers数量。
if (rel->rel_parallel_workers != -1)
parallel_workers = rel->rel_parallel_workers;
-
注意:如果不设置表级存储参数parallel_workers,实际的并行工作线程数由compute_parallel_worker根据heap_pages、index_pages、max_workers来决定并行工作线程数量,因此会出现实际并行工作数量不等于postgresql.conf中设置的max_parallel_workers_per_gather的情况。
-
在次强制:并行workers并不能显箸提升性能。个人不建议使用并行worker进程,大多数情况下采用postgresql.conf默认配置即可。