借着 ByConity 的邀测活动体验一下 bsp 能力,ByConity 也很贴心的提供了标准环境,下面开始体验。
测试环境
版本 | 配置 | |
ByConity v1.0.1 | 集群规格 | Worker:4 * 16core 64G Server:1 * 16core 64G TSO:1 * 4core 16G Daemon Manager:1 * 4core 16G Resource Manager:1 * 8core 32G 存储:对象存储 TOS FoundationDB:3 * 4core 16G |
TPD-DS 测试
登录 ByConity
乍一看咋是 ByteHouse,不过了解的应该都知道,ByConity 是 ByteHouse 的开源版,不纠结继续。
切换到测试库
已经贴心的创建好了库表,准备体验。
查看数据量
看数据量应该是 1T 的 tpcds 数据集。
指定方言
set dialect_type = 'ANSI';
如果写错了,也给了友好的提示,告诉用户支持 'MYSQL', 'ANSI', 'CLICKHOUSE' 三种
SQL 测试
先来个简单点的 q3
select dt.d_year
,item.i_brand_id brand_id
,item.i_brand brand
,sum(ss_ext_sales_price) sum_agg
from date_dim dt
,store_sales
,item
where dt.d_date_sk = store_sales.ss_sold_date_sk
and store_sales.ss_item_sk = item.i_item_sk
and item.i_manufact_id = 128
and dt.d_moy=11
group by dt.d_year
,item.i_brand
,item.i_brand_id
order by dt.d_year
,sum_agg desc
,brand_id
limit 100;
结果如下:
再来个 q1
with customer_total_return as
(
select
sr_customer_sk as ctr_customer_sk,
sr_store_sk as ctr_store_sk
,sum(sr_return_amt) as ctr_total_return
from store_returns, date_dim
where sr_returned_date_sk = d_date_sk and d_year = 2000
group by sr_customer_sk,sr_store_sk)
select c_customer_id
from customer_total_return ctr1, store, customer
where ctr1.ctr_total_return > (
select avg(ctr_total_return) *1.2
from customer_total_return ctr2
where ctr1.ctr_store_sk = ctr2.ctr_store_sk
)
and s_store_sk = ctr1.ctr_store_sk
and s_state = 'TN'
and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100;
结果如下:
居然报错了!!!
当然 ByConity 不会出现这么低级的问题,是因为 dialect_type
默认是 CLICKHOUSE
是因为重新登录没有修改 dialect_type
导致的,改成 ANSI,正常运行,结果如下:
再来个复杂的 q78
with ws as
(select d_year AS ws_sold_year, ws_item_sk,
ws_bill_customer_sk ws_customer_sk,
sum(ws_quantity) ws_qty,
sum(ws_wholesale_cost) ws_wc,
sum(ws_sales_price) ws_sp
from web_sales
left join web_returns on wr_order_number=ws_order_number and ws_item_sk=wr_item_sk
join date_dim on ws_sold_date_sk = d_date_sk
where wr_order_number is null
group by d_year, ws_item_sk, ws_bill_customer_sk
),
cs as
(select d_year AS cs_sold_year, cs_item_sk,
cs_bill_customer_sk cs_customer_sk,
sum(cs_quantity) cs_qty,
sum(cs_wholesale_cost) cs_wc,
sum(cs_sales_price) cs_sp
from catalog_sales
left join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
join date_dim on cs_sold_date_sk = d_date_sk
where cr_order_number is null
group by d_year, cs_item_sk, cs_bill_customer_sk
),
ss as
(select d_year AS ss_sold_year, ss_item_sk,
ss_customer_sk,
sum(ss_quantity) ss_qty,
sum(ss_wholesale_cost) ss_wc,
sum(ss_sales_price) ss_sp
from store_sales
left join store_returns on sr_ticket_number=ss_ticket_number and ss_item_sk=sr_item_sk
join date_dim on ss_sold_date_sk = d_date_sk
where sr_ticket_number is null
group by d_year, ss_item_sk, ss_customer_sk
)
select
ss_sold_year, ss_item_sk, ss_customer_sk,
round(ss_qty/(coalesce(ws_qty,0)+coalesce(cs_qty,0)),2) ratio,
ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price,
coalesce(ws_qty,0)+coalesce(cs_qty,0) other_chan_qty,
coalesce(ws_wc,0)+coalesce(cs_wc,0) other_chan_wholesale_cost,
coalesce(ws_sp,0)+coalesce(cs_sp,0) other_chan_sales_price
from ss
left join ws on (ws_sold_year=ss_sold_year and ws_item_sk=ss_item_sk and ws_customer_sk=ss_customer_sk)
left join cs on (cs_sold_year=ss_sold_year and cs_item_sk=ss_item_sk and cs_customer_sk=ss_customer_sk)
where (coalesce(ws_qty,0)>0 or coalesce(cs_qty, 0)>0) and ss_sold_year=2000
order by
ss_sold_year, ss_item_sk, ss_customer_sk,
ss_qty desc, ss_wc desc, ss_sp desc,
other_chan_qty,
other_chan_wholesale_cost,
other_chan_sales_price,
ratio
LIMIT 100;
结果如下:
内存超限了。。。
ByConity 在这种场景下提供了两个参数来解决这个问题
在 q78 最后加上
SETTINGS bsp_mode = 1, distributed_max_parallel_size = 12;
其中参数distributed_max_parallel_size
可以设置为 4 的其他整数倍(因为 Worker 的数量为4)
再次执行
成功执行。
这里提高了并发,并减少了内存使用,但是在执行时候看不到 CPU 和内存的使用情况,可以像 clickhouse 一样在客户端显示实时使用情况,例如:
在 tpcds 测试中,除了上述 SQL ,还有 q64、q67、q68、q75、q79 也都是很复杂的。
尝试跑下 q64
with cs_ui as
(select cs_item_sk
,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
from catalog_sales
,catalog_returns
where cs_item_sk = cr_item_sk
and cs_order_number = cr_order_number
group by cs_item_sk
having sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)),
cross_sales as
(select i_product_name product_name
,i_item_sk item_sk
,s_store_name store_name
,s_zip store_zip
,ad1.ca_street_number b_street_number
,ad1.ca_street_name b_street_name
,ad1.ca_city b_city
,ad1.ca_zip b_zip
,ad2.ca_street_number c_street_number
,ad2.ca_street_name c_street_name
,ad2.ca_city c_city
,ad2.ca_zip c_zip
,d1.d_year as syear
,d2.d_year as fsyear
,d3.d_year s2year
,count(*) cnt
,sum(ss_wholesale_cost) s1
,sum(ss_list_price) s2
,sum(ss_coupon_amt) s3
FROM store_sales
,store_returns
,cs_ui
,date_dim d1
,date_dim d2
,date_dim d3
,store
,customer
,customer_demographics cd1
,customer_demographics cd2
,promotion
,household_demographics hd1
,household_demographics hd2
,customer_address ad1
,customer_address ad2
,income_band ib1
,income_band ib2
,item
WHERE ss_store_sk = s_store_sk AND
ss_sold_date_sk = d1.d_date_sk AND
ss_customer_sk = c_customer_sk AND
ss_cdemo_sk= cd1.cd_demo_sk AND
ss_hdemo_sk = hd1.hd_demo_sk AND
ss_addr_sk = ad1.ca_address_sk and
ss_item_sk = i_item_sk and
ss_item_sk = sr_item_sk and
ss_ticket_number = sr_ticket_number and
ss_item_sk = cs_ui.cs_item_sk and
c_current_cdemo_sk = cd2.cd_demo_sk AND
c_current_hdemo_sk = hd2.hd_demo_sk AND
c_current_addr_sk = ad2.ca_address_sk and
c_first_sales_date_sk = d2.d_date_sk and
c_first_shipto_date_sk = d3.d_date_sk and
ss_promo_sk = p_promo_sk and
hd1.hd_income_band_sk = ib1.ib_income_band_sk and
hd2.hd_income_band_sk = ib2.ib_income_band_sk and
cd1.cd_marital_status <> cd2.cd_marital_status and
i_color in ('purple','burlywood','indian','spring','floral','medium') and
i_current_price between 64 and 64 + 10 and
i_current_price between 64 + 1 and 64 + 15
group by i_product_name
,i_item_sk
,s_store_name
,s_zip
,ad1.ca_street_number
,ad1.ca_street_name
,ad1.ca_city
,ad1.ca_zip
,ad2.ca_street_number
,ad2.ca_street_name
,ad2.ca_city
,ad2.ca_zip
,d1.d_year
,d2.d_year
,d3.d_year
)
select cs1.product_name
,cs1.store_name
,cs1.store_zip
,cs1.b_street_number
,cs1.b_street_name
,cs1.b_city
,cs1.b_zip
,cs1.c_street_number
,cs1.c_street_name
,cs1.c_city
,cs1.c_zip
,cs1.syear
,cs1.cnt
,cs1.s1 as s11
,cs1.s2 as s21
,cs1.s3 as s31
,cs2.s1 as s12
,cs2.s2 as s22
,cs2.s3 as s32
,cs2.syear
,cs2.cnt
from cross_sales cs1,cross_sales cs2
where cs1.item_sk=cs2.item_sk and
cs1.syear = 1999 and
cs2.syear = 1999 + 1 and
cs2.cnt <= cs1.cnt and
cs1.store_name = cs2.store_name and
cs1.store_zip = cs2.store_zip
order by cs1.product_name
,cs1.store_name
,cs2.cnt
,cs1.s1
,cs2.s1;
结果如下:
确实复杂,执行了 84s,看下执行计划
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Projection Est. 21719222 rows, cost 6.970330e+09 │
│ │ Expressions: b_city:=ca_city_2, b_street_name:=ca_street_name_2, b_street_number:=ca_street_number_2, b_zip:=ca_zip_2, c_city:=ca_city_3, c_street_name:=ca_street_name_3, c_street_number:=ca_street_number_3, c_zip:=ca_zip_3, cnt:=`expr#count()_1`, cnt_1:=`expr#count()_2`, product_name:=i_product_name_1, s11:=`expr#sum(ss_wholesale_cost)_1`, s12:=`expr#sum(ss_wholesale_cost)_2`, s21:=`expr#sum(ss_list_price)_1`, s22:=`expr#sum(ss_list_price)_2`, s31:=`expr#sum(ss_coupon_amt)_1`, s32:=`expr#sum(ss_coupon_amt)_2`, store_name:=s_store_name_2, store_zip:=s_zip_2, syear:=d_year_3, syear_1:=d_year_6 │
│ └─ Sorting Est. 21719222 rows, cost 6.968723e+09 │
│ │ Order by: {i_product_name_1 ASC NULLS LAST, s_store_name_2 ASC NULLS LAST, expr#count()_2 ASC NULLS LAST, expr#sum(ss_wholesale_cost)_1 ASC NULLS LAST, expr#sum(ss_wholesale_cost)_2 ASC NULLS LAST} │
│ └─ Gather Exchange Est. 21719222 rows, cost 6.968723e+09 │
│ └─ Sorting Est. 21719222 rows, cost 6.965248e+09 │
│ │ Order by: {i_product_name_1 ASC NULLS LAST, s_store_name_2 ASC NULLS LAST, expr#count()_2 ASC NULLS LAST, expr#sum(ss_wholesale_cost)_1 ASC NULLS LAST, expr#sum(ss_wholesale_cost)_2 ASC NULLS LAST} │
│ └─ Inner Join Est. 21719222 rows, cost 6.965248e+09 │
│ │ Condition: i_item_sk_2 == i_item_sk_1, s_store_name_2 == s_store_name_1, s_zip_2 == s_zip_1 │
│ │ Filter: `expr#count()_2` <= `expr#count()_1` │
│ ├─ Repartition Exchange Est. 154787 rows, cost 3.572475e+05 │
│ │ │ Partition by: {i_item_sk_2, s_store_name_2, s_zip_2} │
│ │ └─ Filter Est. 77393 rows, cost 3.324814e+05 │
│ │ │ Condition: d_year_6 = cast(2000, 'UInt32') │
│ │ └─ Buffer Est. 309573 rows, cost 3.095730e+05 │
│ │ └─ CTERef[0] Est. 309573 rows, cost 3.095730e+05 │
│ └─ Repartition Exchange Est. 154787 rows, cost 3.572475e+05 │
│ │ Partition by: {i_item_sk_1, s_store_name_1, s_zip_1} │
│ └─ Filter Est. 77393 rows, cost 3.324814e+05 │
│ │ Condition: d_year_3 = 1999 │
│ └─ Buffer Est. 309573 rows, cost 3.095730e+05 │
│ └─ CTERef[0] Est. 309573 rows, cost 3.095730e+05 │
│ CTEDef [0] │
│ Projection Est. 309573 rows, cost 6.956253e+09 │
│ │ Expressions: [ca_city, ca_city_1, ca_street_name, ca_street_name_1, ca_street_number, ca_street_number_1, ca_zip, ca_zip_1, d_year, expr#count(), expr#sum(ss_coupon_amt), expr#sum(ss_list_price), expr#sum(ss_wholesale_cost), i_product_name, s_store_name, s_zip], i_item_sk:=ss_item_sk │
│ └─ MergingAggregated Est. 309573 rows, cost 6.956230e+09 │
│ └─ Repartition Exchange Est. 364649 rows, cost 6.956230e+09 │
│ │ Partition by: {i_product_name, ss_item_sk, s_store_name, s_zip, ca_street_number, ca_street_name, ca_city, ca_zip, ca_street_number_1, ca_street_name_1, ca_city_1, ca_zip_1, d_year, d_year_1, d_year_2} │
│ └─ Aggregating Est. 364649 rows, cost 6.956171e+09 │
│ │ Group by: {i_product_name, ss_item_sk, s_store_name, s_zip, ca_street_number, ca_street_name, ca_city, ca_zip, ca_street_number_1, ca_street_name_1, ca_city_1, ca_zip_1, d_year, d_year_1, d_year_2} │
│ │ Aggregates: expr#count():=AggNull(count)(), expr#sum(ss_wholesale_cost):=AggNull(sum)(ss_wholesale_cost), expr#sum(ss_list_price):=AggNull(sum)(ss_list_price), expr#sum(ss_coupon_amt):=AggNull(sum)(ss_coupon_amt) │
│ └─ Inner Join Est. 364649 rows, cost 6.956171e+09 │
│ │ Condition: ca_address_sk_1 == c_current_addr_sk │
│ │ Runtime Filters Builder: {c_current_addr_sk} │
│ ├─ Repartition Exchange Est. 6000000 rows, cost 5.844000e+06 │
│ │ │ Partition by: {ca_address_sk_1} │
│ │ └─ Filter Est. 6000000 rows, cost 4.884000e+06 │
│ │ │ Condition: Runtime Filters: {ca_address_sk_1} │
│ │ └─ TableScan test_elt.customer_address Est. 6000000 rows, cost 4.440000e+06 │
│ │ Where: Runtime Filters: {ca_address_sk} │
│ │ Outputs: ca_address_sk_1:=ca_address_sk, ca_street_number_1:=ca_street_number, ca_street_name_1:=ca_street_name, ca_city_1:=ca_city, ca_zip_1:=ca_zip │
│ └─ Repartition Exchange Est. 364649 rows, cost 6.947531e+09 │
│ │ Partition by: {c_current_addr_sk} │
│ └─ Inner Join Est. 355608 rows, cost 6.947473e+09 │
│ │ Condition: ca_address_sk == ss_addr_sk │
│ │ Runtime Filters Builder: {ss_addr_sk} │
│ ├─ Repartition Exchange Est. 6000000 rows, cost 5.844000e+06 │
│ │ │ Partition by: {ca_address_sk} │
│ │ └─ Filter Est. 6000000 rows, cost 4.884000e+06 │
│ │ │ Condition: Runtime Filters: {ca_address_sk} │
│ │ └─ TableScan test_elt.customer_address Est. 6000000 rows, cost 4.440000e+06 │
│ │ Where: Runtime Filters: {ca_address_sk} │
│ │ Outputs: [ca_address_sk, ca_street_number, ca_street_name, ca_city, ca_zip] │
│ └─ Repartition Exchange Est. 355608 rows, cost 6.938847e+09 │
│ │ Partition by: {ss_addr_sk} │
│ └─ Inner Join Est. 354159 rows, cost 6.938790e+09 │
│ │ Condition: cd_demo_sk_1 == c_current_cdemo_sk │
│ │ Filter: cd_marital_status != cd_marital_status_1 │
│ │ Runtime Filters Builder: {c_current_cdemo_sk} │
│ ├─ Repartition Exchange Est. 1920800 rows, cost 1.870859e+06 │
│ │ │ Partition by: {cd_demo_sk_1} │
│ │ └─ Filter Est. 1920800 rows, cost 1.563531e+06 │
│ │ │ Condition: Runtime Filters: {cd_demo_sk_1} │
│ │ └─ TableScan test_elt.customer_demographics Est. 1920800 rows, cost 1.421392e+06 │
│ │ Where: Runtime Filters: {cd_demo_sk} │
│ │ Outputs: cd_demo_sk_1:=cd_demo_sk, cd_marital_status_1:=cd_marital_status │
│ └─ Repartition Exchange Est. 708317 rows, cost 6.935220e+09 │
│ │ Partition by: {c_current_cdemo_sk} │
│ └─ Inner Join Est. 706525 rows, cost 6.935107e+09 │
│ │ Condition: cd_demo_sk == ss_cdemo_sk │
│ │ Runtime Filters Builder: {ss_cdemo_sk} │
│ ├─ Repartition Exchange Est. 1920800 rows, cost 1.870859e+06 │
│ │ │ Partition by: {cd_demo_sk} │
│ │ └─ Filter Est. 1920800 rows, cost 1.563531e+06 │
│ │ │ Condition: Runtime Filters: {cd_demo_sk} │
│ │ └─ TableScan test_elt.customer_demographics Est. 1920800 rows, cost 1.421392e+06 │
│ │ Where: Runtime Filters: {cd_demo_sk} │
│ │ Outputs: [cd_demo_sk, cd_marital_status] │
│ └─ Repartition Exchange Est. 706525 rows, cost 6.931409e+09 │
│ │ Partition by: {ss_cdemo_sk} │
│ └─ Inner Join Est. 706525 rows, cost 6.931296e+09 │
│ │ Condition: c_first_shipto_date_sk == d_date_sk_2 │
│ ├─ Repartition Exchange Est. 706525 rows, cost 6.930619e+09 │
│ │ │ Partition by: {c_first_shipto_date_sk} │
│ │ └─ Inner Join Est. 706525 rows, cost 6.930506e+09 │
│ │ │ Condition: c_first_sales_date_sk == d_date_sk_1 │
│ │ ├─ Repartition Exchange Est. 706525 rows, cost 6.929829e+09 │
│ │ │ │ Partition by: {c_first_sales_date_sk} │
│ │ │ └─ Inner Join Est. 706525 rows, cost 6.929716e+09 │
│ │ │ │ Condition: hd_income_band_sk_1 == ib_income_band_sk_1 │
│ │ │ ├─ Inner Join Est. 706525 rows, cost 6.929193e+09 │
│ │ │ │ │ Condition: c_current_hdemo_sk == hd_demo_sk_1 │
│ │ │ │ ├─ Inner Join Est. 706525 rows, cost 6.928626e+09 │
│ │ │ │ │ │ Condition: c_customer_sk == ss_customer_sk │
│ │ │ │ │ │ Runtime Filters Builder: {ss_customer_sk} │
│ │ │ │ │ ├─ Repartition Exchange Est. 12000000 rows, cost 1.168800e+07 │
│ │ │ │ │ │ │ Partition by: {c_customer_sk} │
│ │ │ │ │ │ └─ Filter Est. 12000000 rows, cost 9.768000e+06 │
│ │ │ │ │ │ │ Condition: Runtime Filters: {c_customer_sk} │
│ │ │ │ │ │ └─ TableScan test_elt.customer Est. 12000000 rows, cost 8.880000e+06 │
│ │ │ │ │ │ Where: Runtime Filters: {c_customer_sk} │
│ │ │ │ │ │ Outputs: [c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk, c_first_sales_date_sk] │
│ │ │ │ │ └─ Repartition Exchange Est. 703646 rows, cost 6.911385e+09 │
│ │ │ │ │ │ Partition by: {ss_customer_sk} │
│ │ │ │ │ └─ Inner Join Est. 703646 rows, cost 6.911272e+09 │
│ │ │ │ │ │ Condition: ss_store_sk == s_store_sk │
│ │ │ │ │ ├─ Inner Join Est. 703646 rows, cost 6.910745e+09 │
│ │ │ │ │ │ │ Condition: hd_income_band_sk == ib_income_band_sk │
│ │ │ │ │ │ ├─ Inner Join Est. 703646 rows, cost 6.910224e+09 │
│ │ │ │ │ │ │ │ Condition: ss_hdemo_sk == hd_demo_sk │
│ │ │ │ │ │ │ ├─ Inner Join Est. 703646 rows, cost 6.909659e+09 │
│ │ │ │ │ │ │ │ │ Condition: ss_promo_sk == p_promo_sk │
│ │ │ │ │ │ │ │ ├─ Inner Join Est. 709321 rows, cost 6.909127e+09 │
│ │ │ │ │ │ │ │ │ │ Condition: ss_sold_date_sk == d_date_sk │
│ │ │ │ │ │ │ │ │ │ Runtime Filters Builder: {d_date_sk} │
│ │ │ │ │ │ │ │ │ ├─ Inner Join Est. 1778671 rows, cost 6.908143e+09 │
│ │ │ │ │ │ │ │ │ │ │ Condition: ss_item_sk == sr_item_sk, ss_ticket_number == sr_ticket_number │
│ │ │ │ │ │ │ │ │ │ │ Runtime Filters Builder: {sr_item_sk,sr_ticket_number} │
│ │ │ │ │ │ │ │ │ │ ├─ Filter Est. 2879987999 rows, cost 2.344310e+09 │
│ │ │ │ │ │ │ │ │ │ │ │ Condition: Runtime Filters: {ss_item_sk, ss_sold_date_sk, ss_ticket_number} │
│ │ │ │ │ │ │ │ │ │ │ └─ TableScan test_elt.store_sales Est. 2879987999 rows, cost 2.131191e+09 │
│ │ │ │ │ │ │ │ │ │ │ Where: Runtime Filters: {ss_item_sk, ss_sold_date_sk, ss_ticket_number} │
│ │ │ │ │ │ │ │ │ │ │ Outputs: [ss_sold_date_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_wholesale_cost, ss_list_price, ss_coupon_amt] │
│ │ │ │ │ │ │ │ │ │ └─ Inner Join Est. 185279 rows, cost 3.497355e+09 │
│ │ │ │ │ │ │ │ │ │ │ Condition: sr_item_sk == cs_item_sk │
│ │ │ │ │ │ │ │ │ │ │ Runtime Filters Builder: {cs_item_sk} │
│ │ │ │ │ │ │ │ │ │ ├─ Filter Est. 287999764 rows, cost 2.344318e+08 │
│ │ │ │ │ │ │ │ │ │ │ │ Condition: Runtime Filters: {sr_item_sk} │
│ │ │ │ │ │ │ │ │ │ │ └─ TableScan test_elt.store_returns Est. 287999764 rows, cost 2.131198e+08 │
│ │ │ │ │ │ │ │ │ │ │ Where: Runtime Filters: {sr_item_sk} │
│ │ │ │ │ │ │ │ │ │ │ Outputs: [sr_item_sk, sr_ticket_number] │
│ │ │ │ │ │ │ │ │ │ └─ Inner Join Est. 195 rows, cost 3.156294e+09 │
│ │ │ │ │ │ │ │ │ │ │ Condition: cs_item_sk == i_item_sk │
│ │ │ │ │ │ │ │ │ │ │ Runtime Filters Builder: {i_item_sk} │
│ │ │ │ │ │ │ │ │ │ ├─ Projection Est. 75000 rows, cost 3.156022e+09 │
│ │ │ │ │ │ │ │ │ │ │ │ Expressions: [cs_item_sk] │
│ │ │ │ │ │ │ │ │ │ │ └─ Filter Est. 75000 rows, cost 3.156016e+09 │
│ │ │ │ │ │ │ │ │ │ │ │ Condition: `expr#sum(cs_ext_list_price)` > (2 * `expr#sum(plus(plus(cr_refunded_cash, cr_reversed_charge), cr_store_credit))`) │
│ │ │ │ │ │ │ │ │ │ │ └─ Aggregating Est. 300000 rows, cost 3.155994e+09 │
│ │ │ │ │ │ │ │ │ │ │ │ Group by: {cs_item_sk} │
│ │ │ │ │ │ │ │ │ │ │ │ Aggregates: expr#sum(cs_ext_list_price):=AggNull(sum)(cs_ext_list_price), expr#sum(plus(plus(cr_refunded_cash, cr_reversed_charge), cr_store_credit)):=AggNull(sum)(expr#plus(plus(cr_refunded_cash, cr_reversed_charge), cr_store_credit)) │
│ │ │ │ │ │ │ │ │ │ │ └─ Projection Est. 978035432 rows, cost 2.430634e+09 │
│ │ │ │ │ │ │ │ │ │ │ │ Expressions: [cs_ext_list_price, cs_item_sk], expr#plus(plus(cr_refunded_cash, cr_reversed_charge), cr_store_credit):=(cr_refunded_cash + cr_reversed_charge) + cr_store_credit │
│ │ │ │ │ │ │ │ │ │ │ └─ Inner (PARALLEL_HASH) Join Est. 978035432 rows, cost 2.358259e+09 │
│ │ │ │ │ │ │ │ │ │ │ │ Condition: cs_item_sk == cr_item_sk, cs_order_number == cr_order_number │
│ │ │ │ │ │ │ │ │ │ │ │ Runtime Filters Builder: {cr_order_number} │
│ │ │ │ │ │ │ │ │ │ │ ├─ Filter Est. 1439980416 rows, cost 1.172144e+09 │
│ │ │ │ │ │ │ │ │ │ │ │ │ Condition: Runtime Filters: {cs_item_sk, cs_order_number} │
│ │ │ │ │ │ │ │ │ │ │ │ └─ TableScan test_elt.catalog_sales Est. 1439980416 rows, cost 1.065586e+09 │
│ │ │ │ │ │ │ │ │ │ │ │ Where: Runtime Filters: {cs_item_sk, cs_order_number} │
│ │ │ │ │ │ │ │ │ │ │ │ Outputs: [cs_item_sk, cs_order_number, cs_ext_list_price] │
│ │ │ │ │ │ │ │ │ │ │ └─ Filter Est. 143996756 rows, cost 1.172134e+08 │
│ │ │ │ │ │ │ │ │ │ │ │ Condition: Runtime Filters: {cr_item_sk} │
│ │ │ │ │ │ │ │ │ │ │ └─ TableScan test_elt.catalog_returns Est. 143996756 rows, cost 1.065576e+08 │
│ │ │ │ │ │ │ │ │ │ │ Where: Runtime Filters: {cr_item_sk} │
│ │ │ │ │ │ │ │ │ │ │ Outputs: [cr_item_sk, cr_order_number, cr_refunded_cash, cr_reversed_charge, cr_store_credit] │
│ │ │ │ │ │ │ │ │ │ └─ Projection Est. 195 rows, cost 2.442144e+05 │
│ │ │ │ │ │ │ │ │ │ │ Expressions: [i_item_sk, i_product_name] │
│ │ │ │ │ │ │ │ │ │ └─ Filter Est. 195 rows, cost 2.442000e+05 │
│ │ │ │ │ │ │ │ │ │ │ Condition: (i_color IN ('burlywood', 'floral', 'indian', 'medium', 'purple', 'spring')) AND (i_current_price >= 65.) AND (i_current_price <= 74.) │
│ │ │ │ │ │ │ │ │ │ └─ TableScan test_elt.item Est. 300000 rows, cost 2.220000e+05 │
│ │ │ │ │ │ │ │ │ │ Where: (i_color IN ('burlywood', 'floral', 'indian', 'medium', 'purple', 'spring')) AND (i_current_price >= 65.) AND (i_current_price <= 74.) │
│ │ │ │ │ │ │ │ │ │ Outputs: [i_item_sk, i_current_price, i_color, i_product_name] │
│ │ │ │ │ │ │ │ │ └─ Broadcast Exchange Est. 729 rows, cost 5.992909e+04 │
│ │ │ │ │ │ │ │ │ └─ Filter Est. 729 rows, cost 5.946189e+04 │
│ │ │ │ │ │ │ │ │ │ Condition: (d_year = 1999) OR (d_year = cast(2000, 'UInt32')) │
│ │ │ │ │ │ │ │ │ └─ TableScan test_elt.date_dim Est. 73049 rows, cost 5.405626e+04 │
│ │ │ │ │ │ │ │ │ Where: (d_year = 1999) OR (d_year = cast(2000, 'UInt32')) │
│ │ │ │ │ │ │ │ │ Outputs: [d_date_sk, d_year] │
│ │ │ │ │ │ │ │ └─ Broadcast Exchange Est. 1500 rows, cost 2.070640e+03 │
│ │ │ │ │ │ │ │ └─ TableScan test_elt.promotion Est. 1500 rows, cost 1.110000e+03 │
│ │ │ │ │ │ │ │ Outputs: [p_promo_sk] │
│ │ │ │ │ │ │ └─ Broadcast Exchange Est. 7200 rows, cost 9.936640e+03 │
│ │ │ │ │ │ │ └─ TableScan test_elt.household_demographics Est. 7200 rows, cost 5.328000e+03 │
│ │ │ │ │ │ │ Outputs: [hd_demo_sk, hd_income_band_sk] │
│ │ │ │ │ │ └─ Broadcast Exchange Est. 20 rows, cost 2.824000e+01 │
│ │ │ │ │ │ └─ TableScan test_elt.income_band Est. 20 rows, cost 1.480000e+01 │
│ │ │ │ │ │ Outputs: [ib_income_band_sk] │
│ │ │ │ │ └─ Broadcast Exchange Est. 1002 rows, cost 1.383400e+03 │
│ │ │ │ │ └─ TableScan test_elt.store Est. 1002 rows, cost 7.414800e+02 │
│ │ │ │ │ Outputs: [s_store_sk, s_store_name, s_zip] │
│ │ │ │ └─ Broadcast Exchange Est. 7200 rows, cost 9.936640e+03 │
│ │ │ │ └─ TableScan test_elt.household_demographics Est. 7200 rows, cost 5.328000e+03 │
│ │ │ │ Outputs: hd_demo_sk_1:=hd_demo_sk, hd_income_band_sk_1:=hd_income_band_sk │
│ │ │ └─ Broadcast Exchange Est. 20 rows, cost 2.824000e+01 │
│ │ │ └─ TableScan test_elt.income_band Est. 20 rows, cost 1.480000e+01 │
│ │ │ Outputs: ib_income_band_sk_1:=ib_income_band_sk │
│ │ └─ Repartition Exchange Est. 73049 rows, cost 6.574434e+04 │
│ │ │ Partition by: {d_date_sk_1} │
│ │ └─ TableScan test_elt.date_dim Est. 73049 rows, cost 5.405626e+04 │
│ │ Outputs: d_date_sk_1:=d_date_sk, d_year_1:=d_year │
│ └─ Repartition Exchange Est. 73049 rows, cost 6.574434e+04 │
│ │ Partition by: {d_date_sk_2} │
│ └─ TableScan test_elt.date_dim Est. 73049 rows, cost 5.405626e+04 │
│ Outputs: d_date_sk_2:=d_date_sk, d_year_2:=d_year │
│ note: Runtime Filter is applied for 12 times. │
│ note: CTE(Common Table Expression) is applied for 2 times. │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
执行计划看着就很复杂,不过 ByConity 支持了 RBO + CBO 优化,以及复杂查询,包括 CTE 的优化,可以生成分布式查询计划,不过把 CTE 计划放在主计划下面还是和其他的优化器做法有点不一样。
下面通过减小内存让 q64 OOM,看看 bsp 可不可以让小内存场景依然正常执行
在 q64 后加上以下参数
SETTINGS max_memory_usage=300000000;
执行结果如下:
再加上 bsp 配置参数
SETTINGS max_memory_usage=300000000, bsp_mode = 1, distributed_max_parallel_size = 12;
再次执行结果如下:
虽然这两次都执行出了结果,但是一个是 11492 行,一个是 2 行,暂时不知道是配置没有对还是有了 bug。
总结
总体测试体验还是不错的,bsp 确实能在资源有限的情况下一定程度的解决大查询或复杂查询的问题,但是在想构造一个 OOM 且 bsp 还能运行的参数确实有点难,尝试了好多次,建议可以在查询失败时给出建议,甚至能够自适应开启 bsp 并设置并行度,让用户收到 OOM 时就知道这个 SQL 确实是不能跑了。