Reinforcement Learning with Code 【Chapter 9. Policy Gradient Methods】

Reinforcement Learning with Code

This note records how the author begin to learn RL. Both theoretical understanding and code practice are presented. Many material are referenced such as ZhaoShiyu’s Mathematical Foundation of Reinforcement Learning, .

文章目录

  • Reinforcement Learning with Code
    • Chapter 9. Policy Gradient Methods
      • 9.1 Basic idea of policy gradient
      • 9.2 Metrics to define optimal policies
      • 9.3 Gradients of the metrics
      • 9.4 Policy gradient by Monte Carlo estimation: REINFORCE
    • Reference

Chapter 9. Policy Gradient Methods

​ The idea of function approximation can be applied to represent not only state/action values but also policies. Up to now in this book, policies have been represented by tables: the action probabilities of all states are stored in a table π ( a ∣ s ) \pi(a|s) π(as), each entry of which is indexed by a state and an action. In this chapter, we show that polices can be represented by parameterized functions denoted as π ( a ∣ s , θ ) \pi(a|s,\theta) π(as,θ), where θ ∈ R m \theta\in\mathbb{R}^m θRm is a parameter vector. The function representation is also sometimes written as π ( a , s , θ ) , π θ ( a ∣ s ) , \textcolor{blue}{\pi(a,s,\theta)},\textcolor{blue}{\pi_\theta(a|s)}, π(a,s,θ),πθ(as), or π θ ( a , s ) \textcolor{blue}{\pi_\theta(a,s)} πθ(a,s).

​ When policies are represented as a function, optimal policies can be found by optimizing certain scalar metrics. Such kind of method is called policy gradient.

9.1 Basic idea of policy gradient

How to define optimal policies? When represented as a table, a policy π \pi π is defined as optimal if it can maximize every state value. When represented by a function, a policy π \pi π is fully determined by θ \theta θ together with the function strcuture. The policy is defined as optimal if it can maximize certain scalar metrics, which we will introduce later.

How to update policies? When represented as a table, a plicy π \pi π can be updated by directly changing the entries in the table. However, when represented by a parameterized function, a policy π \pi π cannot be updated in this way anymore. Instead, it can only be improved by updating the parameter θ \theta θ. We can use gradient-based method to optimize some metrics to update the parameter θ \theta θ.

9.2 Metrics to define optimal policies

​ The first metric is the average state value or simply called average value. Let

v π = [ ⋯   , v π ( s ) , ⋯   ] T ∈ R ∣ S ∣ d π = [ ⋯   , d π ( s ) , ⋯   ] T ∈ R ∣ S ∣ v_\pi = [\cdots, v_\pi(s), \cdots]^T \in \mathbb{R}^{|\mathcal{S}|} \\ d_\pi = [\cdots, d_\pi(s), \cdots]^T \in \mathbb{R}^{|\mathcal{S}|} vπ=[,vπ(s),]TRSdπ=[,dπ(s),]TRS

be the vector of state values and a vector of distribution of state value, respectively. Here, d π ( s ) ≥ 0 d_\pi(s)\ge 0 dπ(s)0 is the weight for state s s s and satisfies ∑ s d π ( s ) = 1 \sum_s d_\pi(s)=1 sdπ(s)=1. The metric of average value is defined as

v ˉ π ≜ d π T v π = ∑ s d π ( s ) v π ( s ) = E [ v π ( S ) ] \begin{aligned} \textcolor{red}{\bar{v}_\pi} & \textcolor{red}{\triangleq d_\pi^T v_\pi} \\ & \textcolor{red}{= \sum_s d_\pi(s)v_\pi(s)} \\ & \textcolor{red}{= \mathbb{E}[v_\pi(S)]} \end{aligned} vˉπdπTvπ=sdπ(s)vπ(s)=E[vπ(S)]

where S ∼ d π S \sim d_\pi Sdπ. As its name suggests, v ˉ π \bar{v}_\pi vˉπ is simply a weighted average of the state values. The distribution d π ( s ) d_\pi(s) dπ(s) statisfies stationary distribution by sovling the equation

d π T P π = d π T d^T_\pi P_\pi = d^T_\pi dπTPπ=dπT

where P π P_\pi Pπ is the state transition probability matrix.

​ The second metrics is the average one-step rewrad or simply called average reward. Let

r π = [ ⋯   , r π ( s ) , ⋯   ] T ∈ R ∣ S ∣ r_\pi = [\cdots, r_\pi(s),\cdots]^T \in \mathbb{R}^{|\mathcal{S}|} rπ=[,rπ(s),]TRS

be the vector of one-step immediate rewards. Here

r π ( s ) = ∑ a π ( a ∣ s ) r ( s , a ) r_\pi(s) = \sum_a \pi(a|s)r(s,a) rπ(s)=aπ(as)r(s,a)

is the average of the one-step immediate reward that can be obtained starting from state s s s, and r ( s , a ) = E [ R ∣ s , a ] = ∑ r r p ( r ∣ s , a ) r(s,a)=\mathbb{E}[R|s,a]=\sum_r r p(r|s,a) r(s,a)=E[Rs,a]=rrp(rs,a) is the average of the one-step immediate reward that can be obtained after taking action a a a at state s s s. Then the metric is defined as

r ˉ π ≜ d π T r π = ∑ s d π ( s ) ∑ a π ( a ∣ s ) ∑ r r p ( r ∣ s , a ) = ∑ s d π ( s ) ∑ a π ( a ∣ s ) r ( s , a ) = ∑ s d π ( s ) r π ( s ) = E [ r π ( S ) ] \begin{aligned} \textcolor{red}{\bar{r}_\pi} & \textcolor{red}{\triangleq d_\pi^T r_\pi} \\ & \textcolor{red}{= \sum_s d_\pi(s)\sum_a \pi(a|s) \sum_r r p(r|s,a) } \\ & \textcolor{red}{= \sum_s d_\pi(s)\sum_a \pi(a|s)r(s,a) } \\ & \textcolor{red}{= \sum_s d_\pi(s)r_\pi(s)} \\ & \textcolor{red}{= \mathbb{E}[r_\pi(S)]} \end{aligned} rˉπdπTrπ=sdπ(s)aπ(as)rrp(rs,a)=sdπ(s)aπ(as)r(s,a)=sdπ(s)rπ(s)=E[rπ(S)]

where S ∼ d π S\sim d_\pi Sdπ. As its name suggests, r ˉ π \bar{r}_\pi rˉπ is simply a weighted average of the one-step immediate rewards.

​ The third metric is the state value of a specific starting state v π ( s 0 ) v_\pi(s_0) vπ(s0). For some tasks, we can only start from a specific state s 0 s_0 s0. In this case, we only care about the long-term return starting from s 0 s_0 s0. This metric can also be viewed as a weighted average of the state values.

v π ( s 0 ) = ∑ s ∈ S d 0 ( s ) v π ( s ) \textcolor{red}{v_\pi(s_0) = \sum_{s\in\mathcal{S}} d_0(s) v_\pi(s)} vπ(s0)=sSd0(s)vπ(s)

where d 0 ( s = s 0 ) = 1 , d 0 ( s ≠ s 0 ) = 0 d_0(s=s_0)=1, d_0(s\ne s_0)=0 d0(s=s0)=1,d0(s=s0)=0.

​ We aim to search different value of parameter θ \theta θ to maximize these metrics.

9.3 Gradients of the metrics

Theorem 9.1 (Policy gradient theorem). The gradient of the average reward r ˉ π \bar{r}_\pi rˉπ metric is

∇ θ r ˉ π ( θ ) ≃ ∑ s d π ( s ) ∑ a ∇ θ π ( a ∣ s , θ ) q π ( s , a ) \textcolor{blue}{\nabla_\theta \bar{r}_\pi(\theta) \simeq \sum_s d_\pi(s)\sum_a \nabla_\theta \pi(a|s,\theta) q_\pi(s,a)} θrˉπ(θ)sdπ(s)aθπ(as,θ)qπ(s,a)

where ∇ θ π \nabla_\theta \pi θπ is the gradient of π \pi π with respect to θ \theta θ. Here ≃ \simeq refers to either strict equality or approximated equality. In particular, it is a strict equation in the undiscounted case where γ = 1 \gamma=1 γ=1 and an approximated equation in the discounted case where 0 < γ < 1 0<\gamma<1 0<γ<1. The approximation is more accurate in the discounted case when γ \gamma γ is closer to 1 1 1. Moreover, the equation has a more compact and useful form expressed in terms of expectation:

∇ θ r ˉ π ( θ ) ≃ E [ ∇ θ ln ⁡ π ( A ∣ S , θ ) q π ( S , A ) ] \textcolor{red}{\nabla_\theta \bar{r}_\pi(\theta) \simeq \mathbb{E} [\nabla_\theta \ln \pi(A|S,\theta)q_\pi(S,A)]} θrˉπ(θ)E[θlnπ(AS,θ)qπ(S,A)]

where ln ⁡ \ln ln is the natural logarithm and S ∼ d π , A ∼ π ( S ) S\sim d_\pi, A\sim \pi(S) Sdπ,Aπ(S).

​ Why the two equations mentioned above is equivalent? Here is the derivation process

∇ θ r ˉ π ( θ ) ≃ ∑ s d π ( s ) ∑ a ∇ θ π ( a ∣ s , θ ) q π ( s , a ) = E [ ∑ a ∇ θ π ( a ∣ S , θ ) q π ( S , a ) ] \begin{aligned} \nabla_\theta \bar{r}_\pi(\theta) & \simeq \sum_s d_\pi(s)\sum_a \nabla_\theta \pi(a|s,\theta) q_\pi(s,a) \\ & = \mathbb{E}\Big[ \sum_a \nabla_\theta \pi(a|S,\theta) q_\pi(S,a) \Big] \end{aligned} θrˉπ(θ)sdπ(s)aθπ(as,θ)qπ(s,a)=E[aθπ(aS,θ)qπ(S,a)]

where S ∼ d π ( s ) S \sim d_\pi(s) Sdπ(s). Furthermore, consider the function ln ⁡ π \ln\pi lnπ where ln ⁡ \ln ln is the natural algorithm.

∇ θ ln ⁡ π ( a ∣ s , θ ) = ∇ θ π ( a ∣ s , θ ) π ( a ∣ s , θ ) → ∇ θ π ( a ∣ s , θ ) = π ( a ∣ s , θ ) ∇ θ ln ⁡ π ( a ∣ s , θ ) \begin{aligned} \nabla_\theta \ln \pi (a|s,\theta) & = \frac{\nabla_\theta \pi(a|s,\theta)}{\pi(a|s,\theta)} \\ \to \nabla_\theta \pi(a|s,\theta) &= \pi(a|s,\theta) \nabla_\theta \ln \pi (a|s,\theta) \end{aligned} θlnπ(as,θ)θπ(as,θ)=π(as,θ)θπ(as,θ)=π(as,θ)θlnπ(as,θ)

By substituting

∇ θ r ˉ π ( θ ) = E [ ∑ a ∇ θ π ( a ∣ S , θ ) q π ( S , a ) ] = E [ ∑ a π ( a ∣ S , θ ) ∇ θ ln ⁡ π ( a ∣ S , θ ) q π ( S , a ) ] = E [ ∇ θ ln ⁡ π ( A ∣ S , θ ) q π ( S , A ) ] \begin{aligned} \nabla_\theta \bar{r}_\pi(\theta) & = \mathbb{E}\Big[ \sum_a \nabla_\theta \pi(a|S,\theta) q_\pi(S,a) \Big] \\ & = \mathbb{E}\Big[ \sum_a \pi(a|S,\theta) \nabla_\theta \ln \pi (a|S,\theta) q_\pi(S,a) \Big] \\ & = \mathbb{E}\Big[ \nabla_\theta \ln \pi (A|S,\theta) q_\pi(S,A) \Big] \end{aligned} θrˉπ(θ)=E[aθπ(aS,θ)qπ(S,a)]=E[aπ(aS,θ)θlnπ(aS,θ)qπ(S,a)]=E[θlnπ(AS,θ)qπ(S,A)]

where A ∼ π ( s , θ ) A \sim \pi(s,\theta) Aπ(s,θ).

​ Next we will show the metrics average one-step reward r ˉ π \bar{r}_\pi rˉπ and average state value v ˉ π \bar{v}_\pi vˉπ is equivalent. When discounted rate γ ∈ [ 0 , 1 ) \gamma\in[0,1) γ[0,1) is given, that

r ˉ π = ( 1 − γ ) v ˉ π \textcolor{blue}{\bar{r}_\pi = (1-\gamma)\bar{v}_\pi} rˉπ=(1γ)vˉπ

Proof, note that v ˉ π ( θ ) = d π T v π \bar{v}_\pi(\theta)=d^T_\pi v_\pi vˉπ(θ)=dπTvπ and r ˉ = d π T r π \bar{r}=d^T_\pi r_\pi rˉ=dπTrπ, where v π v_\pi vπ and r π r_\pi rπ statisfy the Bellman equation v π = r π + γ P π v π v_\pi=r_\pi + \gamma P_\pi v_\pi vπ=rπ+γPπvπ. Then multiplying d π T d_\pi^T dπT on the both left sides of the Bellman equation gives

v ˉ π = r ˉ π + γ d π T P π v π = r ˉ π + γ d π T v π = r ˉ π + γ v ˉ π \bar{v}_\pi = \bar{r}_\pi + \gamma d^T_\pi P_\pi v_\pi = \bar{r}_\pi + \gamma d^T_\pi v_\pi = \bar{r}_\pi + \gamma \bar{v}_\pi vˉπ=rˉπ+γdπTPπvπ=rˉπ+γdπTvπ=rˉπ+γvˉπ

which implies r ˉ π = ( 1 − γ ) v ˉ π \bar{r}_\pi = (1-\gamma)\bar{v}_\pi rˉπ=(1γ)vˉπ.

Theorem 9.2 (Gradient of v π ( s 0 ) v_\pi(s_0) vπ(s0) in the discounted case). In the discounted case where γ ∈ [ 0 , 1 ) \gamma \in [0,1) γ[0,1), the gradients of v π ( s 0 ) v_\pi(s_0) vπ(s0) is

∇ θ v π ( s 0 ) = E [ ∇ θ ln ⁡ π ( A ∣ S , θ ) q π ( S , A ) ] \nabla_\theta v_\pi(s_0) = \mathbb{E}[\nabla_\theta \ln \pi(A|S, \theta)q_\pi(S,A)] θvπ(s0)=E[θlnπ(AS,θ)qπ(S,A)]

where S ∼ ρ π S \sim \rho_\pi Sρπ and A ∼ π ( s , θ ) A \sim \pi(s,\theta) Aπ(s,θ). Here, the state distribution ρ π \rho_\pi ρπ is

ρ π ( s ) = Pr ⁡ π ( s ∣ s 0 ) = ∑ k = 0 γ k Pr ⁡ ( s 0 → s , k , π ) = [ ( I n − γ P π ) − 1 ] s 0 , s \rho_\pi(s) = \Pr_\pi (s|s_0) = \sum_{k=0} \gamma^k \Pr (s_0\to s, k, \pi) = [(I_n - \gamma P_\pi)^{-1}]_{s_0,s} ρπ(s)=πPr(ss0)=k=0γkPr(s0s,k,π)=[(InγPπ)1]s0,s

which is the discounted total probability transiting from s 0 s_0 s0 to s s s under policy π \pi π.

Theorem 9.3 (Gradient of v ˉ π \bar{v}_\pi vˉπ and r ˉ π \bar{r}_\pi rˉπ in the discounted case). In the discounted case where γ ∈ [ 0 , 1 ) \gamma \in [0,1) γ[0,1), the gradients of v ˉ π \bar{v}_\pi vˉπ and r ˉ π \bar{r}_\pi rˉπ are, respectively,

∇ θ v ˉ π ≈ 1 1 − γ ∑ s d π ( s ) ∑ a ∇ θ π ( a ∣ s , θ ) q π ( s , a ) ∇ θ r ˉ π ≈ ∑ s d π ( s ) ∑ a ∇ θ π ( a ∣ s , θ ) q π ( s , a ) \begin{aligned} \nabla_\theta \bar{v}_\pi & \approx \frac{1}{1-\gamma} \sum_s d_\pi(s) \sum_a \nabla_\theta \pi(a|s,\theta) q_\pi(s,a) \\ \nabla_\theta \bar{r}_\pi & \approx \sum_s d_\pi(s) \sum_a \nabla_\theta \pi(a|s,\theta) q_\pi(s,a) \end{aligned} θvˉπθrˉπ1γ1sdπ(s)aθπ(as,θ)qπ(s,a)sdπ(s)aθπ(as,θ)qπ(s,a)

where the approximations are more accurate when γ \gamma γ is closer to 1 1 1.

9.4 Policy gradient by Monte Carlo estimation: REINFORCE

​ Consider J ( θ ) = r ˉ π ( θ ) J(\theta) = \bar{r}_\pi(\theta) J(θ)=rˉπ(θ) or v π ( s 0 ) v_\pi(s_0) vπ(s0). The gradient-ascent algorithm maximizing J ( θ ) J(\theta) J(θ) is

θ t + 1 = θ t + α ∇ θ J ( θ ) = θ t + α E [ ∇ θ ln ⁡ π ( A ∣ S , θ t ) q π ( S , A ) ] \begin{aligned} \theta_{t+1} & = \theta_t + \alpha \nabla_\theta J(\theta) \\ & = \theta_t + \alpha \mathbb{E}[\nabla_\theta \ln\pi(A|S,\theta_t) q_\pi(S,A)] \end{aligned} θt+1=θt+αθJ(θ)=θt+αE[θlnπ(AS,θt)qπ(S,A)]

where α > 0 \alpha>0 α>0 is a constant learning rate. Since the expected value on the right-hand side is unknown, we can replace the expected value with a sample (the idea of stochastic gradient). Then we have

θ t + 1 = θ t + α ∇ θ ln ⁡ π ( a t ∣ s t , θ t ) q π ( s t , a t ) \theta_{t+1} = \theta_t + \alpha \nabla_\theta \ln\pi(a_t|s_t,\theta_t) q_\pi(s_t,a_t) θt+1=θt+αθlnπ(atst,θt)qπ(st,at)

However this cannot be implemented because q π ( s t , a t ) q_\pi(s_t,a_t) qπ(st,at) is the true value we can’t obtain. Hence, we use q t ( s t , a t ) q_t(s_t,a_t) qt(st,at) to estimate the true action value q π ( s t , a t ) q_\pi(s_t,a_t) qπ(st,at).

θ t + 1 = θ t + α ∇ θ ln ⁡ π ( a t ∣ s t , θ t ) q t ( s t , a t ) \theta_{t+1} = \theta_t + \alpha \nabla_\theta \ln\pi(a_t|s_t,\theta_t) q_t(s_t,a_t) θt+1=θt+αθlnπ(atst,θt)qt(st,at)

If q π ( s t , a t ) q_\pi(s_t,a_t) qπ(st,at) is approximated by Monte Carlo estimation,

q π ( s t , a t ) ≜ E [ G t ∣ S t = s t , A t = a t ] ≈ 1 n ∑ i = 1 n g ( i ) ( s t , a t ) \begin{aligned} q_\pi(s_t,a_t) & \triangleq \mathbb{E}[G_t|S_t=s_t, A_t=a_t] \\ & \textcolor{blue}{\approx \frac{1}{n} \sum_{i=1}^n g^{(i)}(s_t,a_t)} \\ \end{aligned} qπ(st,at)E[GtSt=st,At=at]n1i=1ng(i)(st,at)

with stochastic approximation we don’t need to collect n n n episode start from ( s t , a t ) (s_t,a_t) (st,at) to approximate q π ( s t , a t ) q_\pi(s_t,a_t) qπ(st,at), we just need a discounted return starting from ( s t , a t ) (s_t,a_t) (st,at)

q π ( s t , a t ) ≈ q t ( a t , a t ) = ∑ k = t + 1 T γ k − t − 1 r k q_\pi(s_t,a_t) \approx q_t(a_t,a_t) = \sum_{k=t+1}^T \gamma^{k-t-1}r_k qπ(st,at)qt(at,at)=k=t+1Tγkt1rk

The algorithm is called REINFORCE.

Pseudocode:

Image

Reference

赵世钰老师的课程

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/50218.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

【COlor传感器】通过扰动调制光传感实现智能光传输的占用分布估计研究(Matlab代码实现)

&#x1f4a5;&#x1f4a5;&#x1f49e;&#x1f49e;欢迎来到本博客❤️❤️&#x1f4a5;&#x1f4a5; &#x1f3c6;博主优势&#xff1a;&#x1f31e;&#x1f31e;&#x1f31e;博客内容尽量做到思维缜密&#xff0c;逻辑清晰&#xff0c;为了方便读者。 ⛳️座右铭&a…

Mysql 数据库开发及企业级应用

文章目录 1、Mysql 数据库开发及企业级应用1.1、为什么要使用数据库1.1.1、数据库概念&#xff08;Database&#xff09;1.1.2、为什么需要数据库 1.2、程序员为什么要学习数据库1.3、数据库的选择1.3.1、主流数据库简介1.3.2、使用 MySQL 的优势1.3.3、版本选择 1.4、Windows …

神码ai火车头伪原创插件怎么用【php源码】

大家好&#xff0c;本文将围绕python绘制烟花特定爆炸效果展开说明&#xff0c;如何用python画一朵花是一个很多人都想弄明白的事情&#xff0c;想搞清楚用python画烟花的代码需要先了解以下几个事情。 1、表白烟花代码 天天敲代码的朋友&#xff0c;有没有想过代码也可以变得…

python与深度学习(十):CNN和cifar10二

目录 1. 说明2. cifar10的CNN模型测试2.1 导入相关库2.2 加载数据和模型2.3 设置保存图片的路径2.4 加载图片2.5 图片预处理2.6 对图片进行预测2.7 显示图片 3. 完整代码和显示结果4. 多张图片进行测试的完整代码以及结果 1. 说明 本篇文章是对上篇文章训练的模型进行测试。首…

Java IO,BIO、NIO、AIO

操作系统中的 I/O 以上是 Java 对操作系统的各种 IO 模型的封装&#xff0c;【文件的输入、输出】在文件处理时&#xff0c;其实依赖操作系统层面的 IO 操作实现的。【把磁盘的数据读到内存种】操作系统中的 IO 有 5 种&#xff1a; 阻塞、 非阻塞、【轮询】 异步、 IO复…

【java的类型数据】——八大类型数据

文章目录 前言字面常量字面常量的分类: 数据类型和变量变量的包装类和范围范围整型变量byteintshortlong 浮点型变量双精度浮点型double单精度浮点型float 字符型变量char布尔型变量 boolean 类型转换自动类型转换&#xff08;隐式&#xff09;强制类型转换&#xff08;显式&am…

Android跨进程传大图思考及实现——附上原理分析

1.抛一个问题 这一天&#xff0c;法海想锻炼小青的定力&#xff0c;由于Bitmap也是一个Parcelable类型的数据&#xff0c;法海想通过Intent给小青传个特别大的图片 intent.putExtra("myBitmap",fhBitmap)如果“法海”(Activity)使用Intent去传递一个大的Bitmap给“…

排序链表——力扣148

文章目录 题目描述法一 自顶向下归并排序法二&#xff09;自底向上归并排序 题目描述 题目的进阶问题要求达到 O(nlogn) 的时间复杂度和 O(1) 的空间复杂度&#xff0c;时间复杂度是 O(nlogn) 的排序算法包括归并排序、堆排序和快速排序&#xff08;快速排序的最差时间复杂度是…

推荐带500创作模型的付费创作V2.1.0独立版系统源码

ChatGPT 付费创作系统 V2.1.0 提供最新的对应版本小程序端&#xff0c;上一版本增加了 PC 端绘画功能&#xff0c; 绘画功能采用其他绘画接口 – 意间 AI&#xff0c;本版新增了百度文心一言接口。 后台一些小细节的优化及一些小 BUG 的处理&#xff0c;前端进行了些小细节优…

【Java面试丨企业场景】常见技术场景

一、单点登录怎么实现的 1. 介绍 单点登录&#xff08;Single Sign On&#xff0c;SSO&#xff09;&#xff1a;只需要登录一次&#xff0c;就可以访问所有信任的应用系统 2. 解决方案 JWT解决单点登录问题 用户访问应用系统&#xff0c;会在网关判断Token是否有效如果Tok…

极简并优雅的在IDEA使用Git远程拉取项目和本地推送项目

连接Git 搜索Git然后将你下载好的Git的文件目录位置给他弄进去就行 本地分支管理 分支管理通常是在IDEA的右下角找到 连接远程仓库 方法1本地项目推送到远程仓库 如果当前项目还没交给Git管理的则按照以下图所示先将项目交给Git管理 然后此时文件都会是红色的&#xff0c;这表…

《向量数据库指南》:向量数据库Pinecone如何集成LangChain (一)

目录 LangChain中的检索增强 建立知识库 欢迎使用Pinecone和LangChain的集成指南。本文档涵盖了将高性能向量数据库Pinecone与基于大型语言模型(LLMs)构建应用程序的框架LangChain集成的步骤。 Pinecone使开发人员能够基于向量相似性搜索构建可扩展的实时推荐和搜索系统…

Meta分析的选题与文献计量分析CiteSpace应用丨R语言Meta分析【数据清洗、精美作图、回归分析、诊断分析、不确定性及贝叶斯应用】

目录 ​专题一、Meta分析的选题与文献计量分析CiteSpace应用 专题二、Meta分析与R语言数据清洗及相关应用 专题三、R语言Meta分析与精美作图 专题四、R语言Meta回归分析 专题五、R语言Meta诊断分析与进阶 专题六、R语言Meta分析的不确定性及贝叶斯应用 专题七、深度拓展…

零信任网络架构与实现技术的研究与思考

目前&#xff0c;国外已有较多有关零信任网络的研究与实践&#xff0c;包括谷歌的 BeyondCorp、BeyondProd&#xff0c;软件定义边界&#xff08;Software Defined Perimeter&#xff0c;SDP&#xff09; 及盖特提出的“持续自适应风险与信任评估”等。国内也有不少安全厂商积极…

Istio网关Gateway 启用TLS

Istio网关Gateway概述 Istio网关Gateway是一个负责处理南北向流量的组件&#xff0c;它通常会暴露服务网格内部的服务&#xff0c;以便外部的请求能够访问到服务网格中的服务。Istio网关Gateway支持多种协议&#xff0c;包括HTTP、HTTPS和GRPC等。 在Istio网关Gateway中&#…

DevOps-Jenkins

Jenkins Jenkins是一个可扩展的持续集成引擎&#xff0c;是一个开源软件项目&#xff0c;旨在提供一个开放易用的软件平台&#xff0c;使软件的持续集成变成可能。 官网 应用场景 场景一 研发人员上传开发好的代码到github代码仓库需要将代码下载nginx服务器部署手动下载再…

C++之poll与epoll总结(一百六十九)

简介&#xff1a; CSDN博客专家&#xff0c;专注Android/Linux系统&#xff0c;分享多mic语音方案、音视频、编解码等技术&#xff0c;与大家一起成长&#xff01; 优质专栏&#xff1a;Audio工程师进阶系列【原创干货持续更新中……】&#x1f680; 人生格言&#xff1a; 人生…

优化基于tcp,socket的ftp文件传输程序

原始程序&#xff1a; template_ftp_server_old.py&#xff1a; import socket import json import struct import os import time import pymysql.cursorssoc socket.socket(socket.AF_INET, socket.SOCK_STREAM) HOST 192.168.31.111 PORT 4101 soc.bind((HOST,PORT)) p…

MVC与MVVM模式的区别

一、MVC Model&#xff08;模型&#xff09;&#xff1a;用于处理应用程序数据逻辑&#xff0c;负责在数据库中存取数据。处理数据的crud View&#xff08;视图&#xff09;&#xff1a;处理数据显示的部分。通常视图是依据模型数据创建的。 Controller&#xff08;控制器&…

25.6 matlab里面的10中优化方法介绍—— 遗传算法(matlab程序)

1.简述 遗传算法&#xff08;Genetic Algorithm, GA&#xff09;是模拟达尔文生物进化论的自然选择和遗传学机理的生物进化过程的计算模型&#xff0c;是一种通过模拟自然进化过程搜索最优解&#xff08;所找到的解是全局最优解&#xff09;的方法。 参数编码、初始群体的设定…