1. 基本定义和理论基础
1.1 再生核希尔伯特空间(RKHS)
给定一个非空集合 X \mathcal{X} X,一个希尔伯特空间 H \mathcal{H} H 称为再生核希尔伯特空间,如果存在一个函数 K : X × X → R K: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R} K:X×X→R,满足:
-
再生性质:对于任意 x ∈ X x \in \mathcal{X} x∈X, K ( x , ⋅ ) ∈ H K(x,\cdot) \in \mathcal{H} K(x,⋅)∈H
-
对于任意 f ∈ H f \in \mathcal{H} f∈H 和任意 x ∈ X x \in \mathcal{X} x∈X,有:
f ( x ) = ⟨ f , K ( x , ⋅ ) ⟩ H f(x) = \langle f, K(x,\cdot) \rangle_{\mathcal{H}} f(x)=⟨f,K(x,⋅)⟩H
这里的 K K K 称为再生核函数。
1.2 分位数回归损失函数
对于给定的分位数水平
τ
∈
(
0
,
1
)
\tau \in (0,1)
τ∈(0,1),分位数回归的检验函数定义为:
ρ
τ
(
u
)
=
u
(
τ
−
I
(
u
<
0
)
)
\rho_\tau(u) = u(\tau - I(u < 0))
ρτ(u)=u(τ−I(u<0))
其中
I
(
⋅
)
I(\cdot)
I(⋅) 是示性函数。这个损失函数具有以下性质:
∂
ρ
τ
(
u
)
∂
u
=
{
τ
,
u
>
0
τ
−
1
,
u
<
0
\frac{\partial \rho_\tau(u)}{\partial u} = \begin{cases} \tau, & u > 0 \\ \tau - 1, & u < 0 \end{cases}
∂u∂ρτ(u)={τ,τ−1,u>0u<0
2. 优化问题的形式化
2.1 原始问题
给定数据集 { ( x i , y i ) } i = 1 n \{(x_i, y_i)\}_{i=1}^n {(xi,yi)}i=1n,其中 ( x i , y i ) ∈ X × R (x_i, y_i) \in \mathcal{X} \times \mathbb{R} (xi,yi)∈X×R,我们的目标是在RKHS H K \mathcal{H}_K HK 中估计条件 τ \tau τ 分位数函数。优化问题可以表示为:
min f ∈ H K 1 n ∑ i = 1 n ρ τ ( y i − f ( x i ) ) + λ ∥ f ∥ H K 2 \min_{f \in \mathcal{H}_K} \frac{1}{n}\sum_{i=1}^n \rho_\tau(y_i - f(x_i)) + \lambda \|f\|^2_{\mathcal{H}_K} f∈HKminn1i=1∑nρτ(yi−f(xi))+λ∥f∥HK2
这里:
- 第一项是经验风险,度量预测值与观测值之间的分位数损失
- 第二项是正则化项,控制函数的复杂度
- λ > 0 \lambda > 0 λ>0 是正则化参数,平衡这两项的权重
2.2 表示定理
根据RKHS的表示定理,最优解必然具有以下形式:
f
(
x
)
=
∑
i
=
1
n
α
i
K
(
x
,
x
i
)
f(x) = \sum_{i=1}^n \alpha_i K(x, x_i)
f(x)=i=1∑nαiK(x,xi)
其中 α = ( α 1 , … , α n ) T \alpha = (\alpha_1,\ldots,\alpha_n)^T α=(α1,…,αn)T 是待估计的系数向量。
3. 计算求解
3.1 等价二次规划问题
利用表示定理并引入松弛变量,原问题可以转化为以下二次规划问题:
min α , ξ , η ∑ i = 1 n ( τ ξ i + ( 1 − τ ) η i ) + λ α T K α s.t. y i = ∑ j = 1 n α j K ( x i , x j ) + ξ i − η i , i = 1 , … , n ξ i , η i ≥ 0 , i = 1 , … , n \begin{aligned} \min_{\alpha,\xi,\eta} & \sum_{i=1}^n (\tau\xi_i + (1-\tau)\eta_i) + \lambda \alpha^T K \alpha \\ \text{s.t.} & \quad y_i = \sum_{j=1}^n \alpha_j K(x_i,x_j) + \xi_i - \eta_i, \quad i=1,\ldots,n \\ & \quad \xi_i, \eta_i \geq 0, \quad i=1,\ldots,n \end{aligned} α,ξ,ηmins.t.i=1∑n(τξi+(1−τ)ηi)+λαTKαyi=j=1∑nαjK(xi,xj)+ξi−ηi,i=1,…,nξi,ηi≥0,i=1,…,n
其中:
- K ∈ R n × n K \in \mathbb{R}^{n \times n} K∈Rn×n 是核矩阵, K i j = K ( x i , x j ) K_{ij} = K(x_i,x_j) Kij=K(xi,xj)
- ξ i , η i \xi_i, \eta_i ξi,ηi 分别表示正向和负向的偏差
- 矩阵形式可写为: y = K α + ξ − η y = K\alpha + \xi - \eta y=Kα+ξ−η
3.2 核函数选择
常用的核函数包括:
-
高斯核(RBF核):
K ( x , y ) = exp ( − ( x − y ) 2 2 σ 2 ) K(x,y) = \exp\left(-\frac{(x-y)^2}{2\sigma^2}\right) K(x,y)=exp(−2σ2(x−y)2)
参数 σ > 0 \sigma > 0 σ>0 控制核的带宽 -
多项式核:
K ( x , y ) = ( x y + c ) d K(x,y) = (xy + c)^d K(x,y)=(xy+c)d
参数 d ∈ N d \in \mathbb{N} d∈N 是多项式的阶数, c ≥ 0 c \geq 0 c≥0 是偏置项 -
线性核:
K ( x , y ) = x y + c K(x,y) = xy + c K(x,y)=xy+c
这是多项式核在 d = 1 d=1 d=1 时的特例 -
拉普拉斯核:
K ( x , y ) = exp ( − ∣ x − y ∣ σ ) K(x,y) = \exp\left(-\frac{|x-y|}{\sigma}\right) K(x,y)=exp(−σ∣x−y∣)
类似于高斯核,但使用L1距离 -
Sigmoid核:
K ( x , y ) = tanh ( α x y + c ) K(x,y) = \tanh(\alpha xy + c) K(x,y)=tanh(αxy+c)
参数 α > 0 \alpha > 0 α>0 控制斜率, c ≥ 0 c \geq 0 c≥0 控制截距
4. 模型选择与参数估计
4.1 交叉验证(CV)
使用K折交叉验证来选择最优的超参数组合 ( λ , θ ) (\lambda, \theta) (λ,θ),其中 θ \theta θ 表示核函数的参数。交叉验证误差定义为:
CV ( λ , θ ) = 1 K ∑ k = 1 K 1 ∣ I k ∣ ∑ i ∈ I k ρ τ ( y i − f ^ λ , θ ( − k ) ( x i ) ) \text{CV}(\lambda,\theta) = \frac{1}{K}\sum_{k=1}^K \frac{1}{|I_k|}\sum_{i\in I_k} \rho_\tau(y_i - \hat{f}_{\lambda,\theta}^{(-k)}(x_i)) CV(λ,θ)=K1k=1∑K∣Ik∣1i∈Ik∑ρτ(yi−f^λ,θ(−k)(xi))
其中:
- I k I_k Ik 是第k折的测试集索引集合
- ∣ I k ∣ |I_k| ∣Ik∣ 是测试集的样本量
- f ^ λ , θ ( − k ) \hat{f}_{\lambda,\theta}^{(-k)} f^λ,θ(−k) 是在除第k折外的数据上训练得到的估计函数
该方法使用了并行计算来加速.
4.2 广义交叉验证(GCV)(待完善)
大规模数据集下的 CV 效率仍然较低
4.3 预测(组外)
对于新的输入点
x
new
x_{\text{new}}
xnew,其预测值为:
f
^
(
x
new
)
=
∑
i
=
1
n
α
^
i
K
(
x
new
,
x
i
)
\hat{f}(x_{\text{new}}) = \sum_{i=1}^n \hat{\alpha}_i K(x_{\text{new}}, x_i)
f^(xnew)=i=1∑nα^iK(xnew,xi)
其中 α ^ \hat{\alpha} α^ 是使用最优超参数在全部训练数据上估计得到的系数向量。
4.4 置信区间(待完善)
5. 理论性质(待完善)
5.1 一致性
在适当的条件下,当样本量
n
→
∞
n \rightarrow \infty
n→∞ 时,估计的分位数函数将收敛到真实的条件分位数函数:
sup
x
∈
X
∣
f
^
n
(
x
)
−
f
τ
(
x
)
∣
→
0
a.s.
\sup_{x \in \mathcal{X}} |\hat{f}_n(x) - f_\tau(x)| \rightarrow 0 \quad \text{a.s.}
x∈Xsup∣f^n(x)−fτ(x)∣→0a.s.
其中 f τ ( x ) f_\tau(x) fτ(x) 是真实的条件 τ \tau τ 分位数函数。
5.2 收敛率
在光滑性假设下,收敛率为:
∥
f
^
n
−
f
τ
∥
∞
=
O
p
(
(
log
n
n
)
s
2
s
+
d
)
\|\hat{f}_n - f_\tau\|_{\infty} = O_p\left(\left(\frac{\log n}{n}\right)^{\frac{s}{2s+d}}\right)
∥f^n−fτ∥∞=Op((nlogn)2s+ds)
其中 s s s 是真实函数的光滑度, d d d 是输入空间的维数。
这个方法结合了分位数回归的鲁棒性和核方法的非线性建模能力,为条件分位数函数的估计提供了一个灵活而有效的框架。通过选择合适的核函数和参数,可以捕捉数据中的非线性关系,而正则化项则有助于控制过拟合,提高模型的泛化能力。
代码(R with 4.4.3, 待完善)
library(CVXR)
library(ggplot2)
library(progress)
library(pbmcapply)
library(patchwork)
library(viridis)
# 生成示例数据
set.seed(123) # 为了结果的可重复性
n = 300
x <- seq(-5, 5, length.out = n) # 生成50个x值
y_true <- 2 * sin(x) + 3 # 真实的线性关系
y <- y_true + rnorm(n, 0, 1) # 带有噪声的观测值
# 分位损失
compute.quantile.loss <- function(y.true, y.pred, tau) {
residuals <- y.true - y.pred
sum(residuals * (tau - (residuals < 0)))
}
# 核函数
kernels <- function(kernel.type = "radial", kernel.params = list()) {
# 定义高斯(径向基)核函数
radial <- function(x, y, sigma = 1) {
exp(-((x - y)^2)/(2 * sigma^2))
}
# 定义线性核函数
linear <- function(x, y, c = 0) {
x * y + c
}
# 定义多项式核函数
polynomial <- function(x, y, degree = 2, c = 1) {
(x * y + c)^degree
}
# 定义拉普拉斯核函数
laplacian <- function(x, y, sigma = 1) {
exp(-abs(x - y)/sigma)
}
# 定义sigmoid核函数
sigmoid <- function(x, y, alpha = 1, c = 0) {
tanh(alpha * x * y + c)
}
# 返回指定的核函数
switch(kernel.type,
"radial" = function(x, y) {
radial(x, y,
sigma = kernel.params$sigma %||% 1)
},
"linear" = function(x, y) {
linear(x, y,
c = kernel.params$c %||% 0)
},
"polynomial" = function(x, y) {
polynomial(x, y,
degree = kernel.params$degree %||% 2,
c = kernel.params$c %||% 1)
},
"laplacian" = function(x, y) {
laplacian(x, y,
sigma = kernel.params$sigma %||% 1)
},
"sigmoid" = function(x, y) {
sigmoid(x, y,
alpha = kernel.params$alpha %||% 1,
c = kernel.params$c %||% 0)
},
stop("Unsupported kernel type"))
}
kernel.matrix <- function(x, kernel.func) {
n <- length(x)
K.reg <- matrix(nrow = n, ncol = n)
for (i in 1:n) {
for (j in 1:n) {
K.reg[i, j] <- kernel.func(x[i], x[j])
}
}
return(K.reg)
}
# 主函数
solve.rkhs.quantile.regression <- function(x, y, tau,
kernel.type = "radial",
kernel.params = list(),
lambda) {
n <- length(y)
alpha <- Variable(n)
xi <- Variable(n, nonneg = TRUE)
eta <- Variable(n, nonneg = TRUE)
kernel.func <- kernels(kernel.type, kernel.params)
K.reg <- kernel.matrix(x, kernel.func)
# 构建优化问题
objective <- Minimize(sum(tau * xi + (1 - tau) * eta) +
lambda * quad_form(alpha, K.reg))
constraints <- list(
y == K.reg %*% alpha + xi - eta
)
# 求解优化问题
problem <- Problem(objective, constraints)
solution <- solve(problem)
# 获取拟合值
alpha.hat <- solution$getValue(alpha)
fitted.values <- K.reg %*% alpha.hat
# 计算残差
residuals <- y - fitted.values
# 计算有效自由度
A <- K.reg %*% solve(K.reg + lambda * diag(n)) %*% t(K.reg)
df <- sum(diag(A))
# 计算诊断统计量
mse <- mean(residuals^2)
mae <- mean(abs(residuals))
quantile.loss <- mean(tau * pmax(residuals, 0) + (tau - 1) * pmin(residuals, 0))
return(list(
# 模型参数
alpha = alpha.hat,
kernel.type = kernel.type,
kernel.params = kernel.params,
lambda = lambda,
tau = tau,
x.train = x,
# 拟合结果
fitted.values = fitted.values,
residuals = residuals,
# 诊断统计量
df = df,
mse = mse,
mae = mae,
quantile.loss = quantile.loss,
# 优化信息
convergence = solution$status,
objective = solution$value,
# 核矩阵信息
Kmat = K.reg
))
}
select.params.cv <- function(x, y, tau,
kernel.type = "radial",
kernel.params.grid = list(
sigma = c(0.1, 0.5, 1, 2) # 默认为高斯核的参数网格
),
lambda.grid = 10^seq(-3, 0, by = 0.5),
K = 5,
parallel = FALSE) {
# 初始化基本参数
n <- length(y)
fold.indices <- sample(rep(1:K, length.out = n))
# 根据核函数类型创建参数网格
param.grid <- switch(kernel.type,
"radial" = expand.grid(
sigma = kernel.params.grid$sigma,
lambda = lambda.grid
),
"polynomial" = expand.grid(
degree = kernel.params.grid$degree %||% c(2, 3),
c = kernel.params.grid$c %||% c(0, 1),
lambda = lambda.grid
),
"linear" = expand.grid(
c = kernel.params.grid$c %||% 0,
lambda = lambda.grid
),
"laplacian" = expand.grid(
sigma = kernel.params.grid$sigma,
lambda = lambda.grid
),
"sigmoid" = expand.grid(
alpha = kernel.params.grid$alpha %||% c(0.5, 1),
c = kernel.params.grid$c %||% c(0, 1),
lambda = lambda.grid
)
)
# 定义用于计算单个参数组合交叉验证误差的函数
compute.cv.error <- function(param.idx) {
# 获取当前参数组合
current.params <- param.grid[param.idx, ]
current.lambda <- current.params$lambda
# 提取核函数参数(去除lambda列)
kernel.params <- as.list(current.params[names(current.params) != "lambda"])
# 创建当前参数组合的核函数
current.kernel <- kernels(
kernel.type = kernel.type,
kernel.params = kernel.params
)
cv.error <- 0
fold.results <- list()
# 对每个折叠进行交叉验证
for (k in 1:K) {
test.idx <- which(fold.indices == k)
train.idx <- which(fold.indices != k)
x.train <- x[train.idx]
y.train <- y[train.idx]
x.test <- x[test.idx]
y.test <- y[test.idx]
# 尝试拟合模型
fit <- try({
solve.rkhs.quantile.regression(
x.train, y.train,
tau = tau,
kernel.type = kernel.type,
kernel.params = kernel.params,
lambda = current.lambda
)
}, silent = TRUE)
if (!inherits(fit, "try-error")) {
# 构建测试集的核矩阵
K.test <- matrix(nrow = length(x.test), ncol = length(x.train))
for (t in seq_along(x.test)) {
for (s in seq_along(x.train)) {
K.test[t, s] <- current.kernel(x.test[t], x.train[s])
}
}
# 计算预测值和误差
y.pred <- K.test %*% fit$alpha
fold.error <- compute.quantile.loss(y.test, y.pred, tau)
cv.error <- cv.error + fold.error
fold.results[[k]] <- list(
error = fold.error,
predictions = y.pred,
actual = y.test,
convergence = fit$convergence
)
} else {
cv.error <- Inf
fold.results[[k]] <- list(
error = Inf,
convergence = "failed"
)
break
}
}
list(
mean.error = cv.error / K,
fold.results = fold.results,
kernel.params = kernel.params,
lambda = current.lambda
)
}
# 根据parallel参数选择计算方式
if (parallel) {
cv.results <- pbmclapply(
1:nrow(param.grid),
compute.cv.error,
mc.cores = parallel::detectCores() - 1
)
} else {
pb <- progress_bar$new(
format = " Computing [:bar] :percent eta: :eta",
total = nrow(param.grid),
clear = FALSE,
width = 60
)
cv.results <- list()
for (i in 1:nrow(param.grid)) {
cv.results[[i]] <- compute.cv.error(i)
pb$tick()
}
}
# 提取交叉验证误差并重塑为矩阵形式
n.kernel.params <- nrow(param.grid) / length(lambda.grid)
cv.errors <- matrix(
sapply(cv.results, function(x) x$mean.error),
nrow = n.kernel.params,
ncol = length(lambda.grid),
byrow = TRUE
)
# 找到最优参数组合
best.idx <- which(cv.errors == min(cv.errors), arr.ind = TRUE)
best.params.idx <- (best.idx[1] - 1) * length(lambda.grid) + best.idx[2]
best.params <- param.grid[best.params.idx, ]
best.lambda <- best.params$lambda
best.kernel.params <- as.list(best.params[names(best.params) != "lambda"])
# 使用最优参数进行最终拟合
best.fit <- solve.rkhs.quantile.regression(
x, y,
tau = tau,
kernel.type = kernel.type,
kernel.params = best.kernel.params,
lambda = best.lambda
)
# 计算诊断统计量
cv.stats <- list(
mean.cv.error = mean(cv.errors[is.finite(cv.errors)]),
sd.cv.error = sd(cv.errors[is.finite(cv.errors)]),
min.cv.error = min(cv.errors),
max.cv.error = max(cv.errors[is.finite(cv.errors)]),
convergence.rate = mean(!is.infinite(as.vector(cv.errors))),
best.kernel.params = best.kernel.params,
best.lambda = best.lambda
)
# 返回完整结果
structure(list(
# 最优参数
kernel.type = kernel.type,
best.kernel.params = best.kernel.params,
best.lambda = best.lambda,
# 交叉验证结果
cv.errors = cv.errors,
cv.results = cv.results,
param.grid = param.grid, # 保存完整的参数网格
lambda.grid = lambda.grid,
# 最优模型
best.fit = best.fit,
# 诊断信息
cv.diagnostics = cv.stats,
# 原始数据
x = x,
y = y,
tau = tau,
# 计算设置
K = K,
parallel = parallel,
timestamp = Sys.time()
), class = "rkhs.quantile.fit")
}
# 预测函数:用于对新数据点进行预测
predict.rkhs.quantile <- function(object, newx) {
# 使用已训练模型的参数对新数据进行预测
# 参数:
# object: 训练好的模型对象,包含训练数据和模型参数
# newx: 需要预测的新数据点
# 获取核函数类型和参数
kernel.func <- kernels(
kernel.type = object$kernel.type,
kernel.params = object$kernel.params
)
# 构建新数据点与训练数据之间的核矩阵
K.new <- matrix(nrow = length(newx), ncol = length(object$x.train))
for (i in seq_along(newx)) {
for (j in seq_along(object$x.train)) {
K.new[i, j] <- kernel.func(newx[i], object$x.train[j])
}
}
# 计算预测值
y.pred <- K.new %*% object$alpha
return(y.pred)
}
# 绘制模型诊断图
plot.rkhs.quantile <- function(fit) {
# 创建一个包含四个诊断图的面板布局
old.par <- par(mfrow = c(2, 2))
on.exit(par(old.par)) # 确保在函数退出时恢复原始参数设置
# 残差与拟合值的关系图
plot(fit$fitted.values, fit$residuals,
xlab = "Fitted Values",
ylab = "Residuals",
main = "Residuals vs Fitted",
pch = 20)
abline(h = 0, lty = 2, col = "gray")
# 残差的正态Q-Q图
qqnorm(fit$residuals,
main = "Normal Q-Q Plot",
pch = 20)
qqline(fit$residuals, col = "red")
# 残差的密度分布图
res.density <- density(fit$residuals)
plot(res.density,
main = "Residuals Density",
xlab = "Residuals",
ylab = "Density")
polygon(res.density, col = "lightgray", border = "gray")
# Scale-Location图(标准化残差的平方根)
plot(fit$fitted.values, sqrt(abs(fit$residuals)),
xlab = "Fitted Values",
ylab = expression(sqrt("|Residuals|")),
main = "Scale-Location",
pch = 20)
# 添加平滑线以帮助识别趋势
if(requireNamespace("stats", quietly = TRUE)) {
try({
smooth <- loess.smooth(fit$fitted.values, sqrt(abs(fit$residuals)))
lines(smooth$x, smooth$y, col = "red", lwd = 2)
}, silent = TRUE)
}
}
# 创建可视化函数
plot.cv.results <- function(fit) {
# 获取核函数参数(除了lambda)
kernel.param <- names(fit$param.grid)[1] # 第一列应该是核函数参数
# 准备数据
cv.errors.df <- data.frame(
param = rep(fit$param.grid[[kernel.param]][!duplicated(fit$param.grid[[kernel.param]])],
each = length(fit$lambda.grid)),
lambda = rep(fit$lambda.grid,
times = length(unique(fit$param.grid[[kernel.param]]))),
error = as.vector(fit$cv.errors)
)
# 处理 Inf 值为 NA
cv.errors.df$error[is.infinite(cv.errors.df$error)] <- NA
# 创建热图
# 首先在geom_tile()之前加上映射
p1 = ggplot(cv.errors.df, aes(x = lambda, y = param)) +
geom_tile(aes(fill = error)) + # 在这里指定fill映射
scale_fill_viridis_c(na.value = "grey90") +
geom_point(
data = data.frame(
lambda = fit$best.lambda,
param = unlist(fit$best.kernel.params)[1]
),
color = "red",
size = 3
) +
labs(
title = paste("Cross-validation Error Heatmap for", fit$kernel.type, "kernel"),
x = "Lambda",
y = kernel.param
) +
theme_minimal() +
theme(panel.grid = element_blank())
# 参数效应图
p2 <- ggplot(cv.errors.df[!is.na(cv.errors.df$error), ],
aes(x = lambda, y = error, color = factor(param))) +
geom_line() +
scale_x_log10() +
labs(
x = "Lambda (log scale)",
y = "CV Error",
color = kernel.param # 使用实际的参数名
) +
theme_minimal()
# 组合图形
if (requireNamespace("patchwork", quietly = TRUE)) {
p1 / p2
} else {
p1
}
}
cv.param <- select.params.cv(
x, y,
tau = 0.5,
kernel.type = "radial",
kernel.params.grid = list(sigma = c(0.1, 0.5, 1, 1.5, 2, 2.5, 3)),
lambda.grid = seq(0, 5, 0.5),
parallel = TRUE
)
plot.cv.results(cv.param)
# 查看诊断信息
print(cv.param$cv.diagnostics)
# 预测新数据
newx <- seq(min(x), max(x), length.out = 100)
predict.rkhs.quantile(cv.param$best.fit, newx)
# 绘制诊断图
plot.rkhs.quantile(cv.param$best.fit)
# 使用选择的参数进行拟合
tau_levels <- c(0.25, 0.5, 0.75)
data.rq <- data.frame()
for (tau in tau_levels) {
fit.rq <- solve.rkhs.quantile.regression(x, y,
tau = tau,
kernel.type = 'radial',
kernel.params = list(cv.param$best.kernel.params),
lambda = cv.param$best.lambda)
res = data.frame(x = x, y = fit.rq$fitted.values, tau = tau)
data.rq <- rbind(data.rq, res)
}
fit.ols = solve.rkhs.regression(x, y,
kernel.type = "radial",
kernel.params = list(sigma = 1),
lambda = 0.2)
data.ols = data.frame(x = x, y = fit.ols$fitted.values)
data.ols$type = "OLS" # 添加类型标识
data.rq$type = paste0("tau= ", data.rq$tau) # 为每个分位数添加描述性标签
data_points = data.frame(x = x, y = y)
ggplot() +
geom_point(data = data_points,
aes(x = x, y = y),
alpha = 0.5,
color = "gray50") +
# OLS回归线
geom_line(data = data.ols,
aes(x = x, y = y,
color = "OLS"), # 将OLS作为一个特殊的类别
linetype = "dashed",
size = 1) +
# 分位数回归线
geom_line(data = data.rq,
aes(x = x, y = y,
color = type),
size = 1) +
theme(
legend.position = "bottom",
legend.title = element_text(size = 12),
legend.text = element_text(size = 10),
plot.title = element_text(size = 14, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
panel.grid.major = element_line(color = "gray90"),
panel.grid.minor = element_blank()
) +
# 根据实际的类别名称设置颜色
scale_color_manual(
name = "Regression",
values = c(
"OLS" = "black",
"tau= 0.25" = "#E41A1C",
"tau= 0.5" = "#4DAF4A",
"tau= 0.75" = "#377EB8"
)
) +
# 设置图例
guides(color = guide_legend(
title = "Regression",
nrow = 1,
override.aes = list(
linetype = c("dashed", "solid", "solid", "solid") # 对应四种类型
)
))
运行结果