PerfEnforce Demonstration: Data Analytics with Performance Guarantees

Created by: ctur
Date: April 2, 2023 2:54 PM
Status: ready to start

实时响应式的扩展算法

实时响应式的扩展算法分为 1. 比例积分控制 2. 强化学习

比例积分控制方法

“We use a proportional-integral controller (PI) as a method that helps PerfEnforce react based on the magnitude of the error while avoiding oscillations over time.” (Ortiz et al., 2016, p. 3) (pdf) 🔤我们使用比例积分控制器 (PI) 作为一种方法，帮助 PerfEnforce 根据误差的大小做出反应，同时避免随时间发生振荡。🔤

At each time step, t, the controller produces an actuator value u(t) that causes the system to produce an output y(t + 1) at the next time step. The goal is for the system output y(t) to be equal to some desired reference output r(t). In an integral controller, the actuator value depends on the accumulation of past errors of the system. This can be represented as: 在每个时间步长 t，控制器产生执行器值 u(t)，使系统在下一个时间步长产生输出 y(t + 1)。目标是使系统输出 y(t) 等于某个所需的参考输出 r(t)。在积分控制器中，执行器值取决于系统过去误差的累积。这可以表示为：

在这里插入图片描述
notes：
$\frac{t_{\text {real }}(q)}{t_{s l a}(q)}$
这个比例值大于1，代表的是资源分配不足；
小于1代表的是资源过度分配了。

“Integral control alone may be slow to react to changes in the workload. Therefore, we also introduce a proportional control component, to yield a PI controller with the following formulation:” (Ortiz et al., 2016, p. 3) (pdf) 🔤单独的整体控制可能对工作负载的变化反应缓慢。因此，我们还引入了比例控制组件，以生成具有以下公式的 PI 控制器：🔤
在这里插入图片描述

强化学习控制方法

使用RL方法，把集群缩放过程建模为MDP。目标是转移状态以产生最大的奖励值。当系统在不同的状态之间移动的时候，模型会不断地学习并且更新奖励值。

In our setting, each cluster configuration represents a state in the model. We define the reward function to be the realto-SLA runtime ratio. Our goal is to favor states with the reward closest to 1.0, where the real query runtimes are closest to the SLA runtimes (see Section 3.1). 也就是说，目标是偏好于接近于1的奖励，奖励值为 1 代表了真实响应时间最接近服务水平协议 SLA 要求的响应时间。

In RL, every time the system transitions to a state s, it updates the reward function for that state. In our setting, we use the following equation, where R(s′) denotes the updated reward for state s: 下面公式展示的是，发生了状态转移的时候，奖励值的更新过程。

让我们理解一下这个过程，在状态 s 下，有一系列的参数输入给系统，产生了一些效果，并且让系统达到了新的状态 s‘ ，这个效果呢，其实就是 alpha 所乘的那一坨的东西。可能是正的，也可能是负的
在这里插入图片描述

除了上面说的两个实时响应的方法之外，还有一类方法被称为主动缩放算法：

Proactive Scaling Algorithms（主动缩放算法，也就是基于预测的算法）

“Instead of approaches that react to runtime errors such as the PI controller and reinforcement learning, we also explore an approach that makes use of a predictive model. For each incoming query, PerfEnforce predicts the runtime for the query for each configuration and switches to the cheapest configuration that meets the SLA. PerfEnforce first builds an offline model. For training data, we use the Parallel Data Generation Framework tool [14] to generate a 10GB dataset with a set of 2500 queries. Training data consists of query plan features including the estimated max cost, estimated number of rows, estimated width, and number of workers. We observe that we can significantly improve this offline model if we incorporate information about specific queries that the user executes on his data. We achieve this goal by using an online machine learning model: as the user executes queries, PerfEnforce improves the model in an online fashion. We use the MOA (Massive Online Analysis) tool for online learning [4].” (Ortiz 等, 2016, p. 4) (pdf)

🔤我们还探索了一种使用预测模型的方法，而不是对运行时错误做出反应的方法，例如 PI 控制器和强化学习。对于每个传入查询，PerfEnforce 预测每个配置的查询运行时间，并切换到满足 SLA 的最便宜的配置。

PerfEnforce 首先构建一个离线模型。对于训练数据，我们使用并行数据生成框架工具 [14] 生成一个包含 2500 个查询的 10GB 数据集。训练数据由查询计划特征组成，包括估计的最大成本、估计的行数、估计的宽度和工人数。我们观察到，如果我们合并有关用户对其数据执行的特定查询的信息，我们可以显着改进此离线模型。

我们通过使用在线机器学习模型实现了这一目标：当用户执行查询时，PerfEnforce 以在线方式改进模型。我们使用 MOA（Massive Online Analysis）工具进行在线学习 [4]。🔤

在线学习的概念

“Online Learning (OL). We use the perceptron algorithm for the online model. The perceptron algorithm works by adjusting model weights for each new data point. We find that it adapts more quickly to new information than an active-learning based approach. PerfEnforce initiates the perceptron model by learning from the training set. For the first query, PerfEnforce uses this model to predict a runtime for each possible configuration. The cluster size with the closest runtime to the query’s SLA is chosen. Once the system runs the query and knows the real runtime, it adds this information to the model” (Ortiz 等, 2016, p. 4)
🔤在线学习 (OL)。我们对在线模型使用感知器算法。感知器算法通过为每个新数据点调整模型权重来工作。我们发现它比基于主动学习的方法更快地适应新信息。 PerfEnforce 通过从训练集中学习来启动感知器模型。对于第一个查询，PerfEnforce 使用此模型来预测每个可能配置的运行时间。选择运行时间最接近查询 SLA 的集群大小。一旦系统运行查询并知道真正的运行时间，它就会将此信息添加到模型中🔤

to ensure high quality of service even early on in a query session, we add a buffer based on the observed prediction errors. We call this method online learning with control buffer (OL+B). We predict the runtime of the next query tpred(qt+1) as follows:

简单地增加学习率会增加抖动。下面的公式用来预测 t+1 时刻的查询 $q_{t+1}$ 的运行时间。

在这里插入图片描述

“PerfEnforce adjusts that runtime if the previous window of queries resulted in a positive percent error. Based on this adjusted prediction, PerfEnforce allocates the cluster size with the predicted runtime closest to the SLA deadline.” (Ortiz 等, 2016, p. 4) (pdf)
🔤如果前一个查询窗口导致正百分比错误，PerfEnforce 会调整该运行时。基于此调整后的预测，PerfEnforce 会根据最接近 SLA 截止日期的预测运行时间来分配集群大小。🔤

三种算法的对比

在这里插入图片描述上图的实验取的是算法在三个数据集表现上的平均值。

As the graph shows, no method clearly dominates the others.

对x轴的解释，是相对于最大配置下，节省的钱💰，这里是经过标准化的值。越是右边，节省的钱越多，成本也就越低。

On the x-axis, we measure a normalized value of money saved. This value is derived from the cost for each query, Cq. We calculate the money saved compared with using the largest, 12node cluster, for each query.

y轴是满足了服务质量要求的查询任务的比率，反映了performance。

下面这一段话对实验结果的描述非常值得学习！！！

For Proportional Integral Control (PI), we vary the query window sizes, W , and use a combination of either high or low values for the gain parameters. From these experiments, higher ki values result in a slightly higher QoS than using higher kp. For reinforcement learning (RL), we end up with a higher QoS (75% - 82%) than the PI control methods. We find that Online learning (OL) yields the highest QoS (above 85%) for several learning rates, LR. Finally, Online Learning with Control Buffer(OL+B), has better QoS compared to OL but at an even higher cost. In this experiment, both OL and OL+B techniques provide the highest QoS, while PI control leads to largest cost savings.