#include <stdio.h>int arr[101][101];
int n;
int i,j;
int m;
int tmp;
void Print(){for(i1;i<n;i){for(j1;j<n-i1;j)printf("%d ",arr[i][j]);puts("");}
}void fun(){//i j 初值为1i1,j1;//保底用 tmp 1;//计数从1开始m 1;while(1)…
Lecture 5: Monte Carlo Learning
The simplest MC-based RL algorithm: MC Basic
理解MC basic算法的关键是理解如何将policy iteration算法迁移到model-free的条件下。
Policy iteration算法在每次迭代过程中有两步: { Policy evaluation: v π k r π k γ…