得到训练好的pth后,下一并将其进行部署及效果评估
可以在jupyter notebook中进行此操作,文件为robomimic文件夹中的examples/notebooks/run_policy.ipynb
本文采用pycharm调试
该脚本用于在环境中评估策略,主要包括从model zoo下载checkpoint,在pytorch中加载checkpoint,并运行评估policy
目录
一、参数说明
二、Terminal运行
1. 执行评估策略
2. 保存方法1
3. 保存方法2
三、逐步运行与解析(run a trained policy and visualize the rollout)
1. 库引用
2. 下载policy checkpoint
3. 加载trained policy
4. 创建 rollout 环境
5. 定义 rollout 循环
6. 运行policy
7. 可视化 rollout
8.运行结果
一、参数说明
"""
The main script for evaluating a policy in an environment.
Args:
agent (str): path to saved checkpoint pth file
horizon (int): if provided, override maximum horizon of rollout from the one
in the checkpoint
env (str): if provided, override name of env from the one in the checkpoint,
and use it for rollouts
render (bool): if flag is provided, use on-screen rendering during rollouts
video_path (str): if provided, render trajectories to this video file path
video_skip (int): render frames to a video every @video_skip steps
camera_names (str or [str]): camera name(s) to use for rendering on-screen or to video
dataset_path (str): if provided, an hdf5 file will be written at this path with the
rollout data
dataset_obs (bool): if flag is provided, and @dataset_path is provided, include
possible high-dimensional observations in output dataset hdf5 file (by default,
observations are excluded and only simulator states are saved).
seed (int): if provided, set seed for rollouts
Example usage:
# Evaluate a policy with 50 rollouts of maximum horizon 400 and save the rollouts to a video.
# Visualize the agentview and wrist cameras during the rollout.
python run_trained_agent.py --agent /path/to/model.pth \
--n_rollouts 50 --horizon 400 --seed 0 \
--video_path /path/to/output.mp4 \
--camera_names agentview robot0_eye_in_hand
# Write the 50 agent rollouts to a new dataset hdf5.
python run_trained_agent.py --agent /path/to/model.pth \
--n_rollouts 50 --horizon 400 --seed 0 \
--dataset_path /path/to/output.hdf5 --dataset_obs
# Write the 50 agent rollouts to a new dataset hdf5, but exclude the dataset observations
# since they might be high-dimensional (they can be extracted again using the
# dataset_states_to_obs.py script).
python run_trained_agent.py --agent /path/to/model.pth \
--n_rollouts 50 --horizon 400 --seed 0 \
--dataset_path /path/to/output.hdf5
"""
agent (str): 已保存的检查点模型文件路径(.pth文件)。
horizon (int):如果提供,将覆盖检查点中的最大回合长度,即在评估中运行多少步
env (str): 如果提供,将覆盖检查点中保存的环境名称,用于运行策略时创建新的环境
render (bool):如果提供该标志,在每次回合执行时显示屏幕上的实时渲染
video_path (str):如果提供,将回合过程录制为视频并保存到指定的路径
video_skip (int):每隔 @video_skip 步渲染一次帧到视频中
camera_names (str 或 [str]):指定用于渲染的相机名称。
dataset_path (str):如果提供,将回合数据写入到指定路径的hdf5文件中
dataset_obs (bool):如果flag及@dataset_path提供,hdf5文件中将包含可能的高维观测数据(默认情况下,仅保存模拟器状态)
seed (int): 如果提供,设置回合的随机种子
二、Terminal运行
保存回合到视频并渲染相机视图
评估一个策略,50次滚动,最大地平线400,并将滚动保存到视频中
在过程中可视化agentview和手腕摄像头
1. 执行评估策略
进行50次回合,每次最多运行400步,并将回合保存为视频文件,回合过程中显示agentview和wrist相机视角的画面
python run_trained_agent.py --agent /path/to/model.pth \
--n_rollouts 50 --horizon 400 --seed 0 \
--video_path /path/to/output.mp4 \
--camera_names agentview robot0_eye_in_hand
2. 保存方法1
将50次回合保存到hdf5数据集文件中
python run_trained_agent.py --agent /path/to/model.pth \
--n_rollouts 50 --horizon 400 --seed 0 \
--dataset_path /path/to/output.hdf5 --dataset_obs
3. 保存方法2
将50次回合保存到hdf5数据集文件中,但不包含数据观测数据,因为这些数据可能是高维的
可以使用dataset_states_to_obs.py脚本再次提取这些观测数据
python run_trained_agent.py --agent /path/to/model.pth \
--n_rollouts 50 --horizon 400 --seed 0 \
--dataset_path /path/to/output.hdf5
三、逐步运行与解析(run a trained policy and visualize the rollout)
官方提供了python代码的示例,先看一遍相关示例,再进一步解析
示例采用jupyter演示,库环境默认已安装好robomimic和robosuite
1. 库引用
import argparse
import json
import h5py
import imageio
import numpy as np
import os
from copy import deepcopy
import torch
import robomimic
import robomimic.utils.file_utils as FileUtils
import robomimic.utils.torch_utils as TorchUtils
import robomimic.utils.tensor_utils as TensorUtils
import robomimic.utils.obs_utils as ObsUtils
from robomimic.envs.env_base import EnvBase
from robomimic.algo import RolloutPolicy
import urllib.request
2. 下载policy checkpoint
从model zoo中下载pretrained model
此处简单的说明一下 checkpoint,在深度学习模型中,checkpoint 是指在训练过程中保存的模型状态,通常包含模型的参数(权重和偏置)、优化器的状态以及其他相关的训练信息
在训练过程中定期保存模型 checkpoint,就可以在需要时恢复训练或用于模型评估和推理
模型的参数(权重和偏置)文件:在 TensorFlow 中通常是 .cpkt 文件,在 PyTorch 中通常是 .pt 或 .pth 文件
# Get pretrained checkpooint from the model zoo
ckpt_path = "lift_ph_low_dim_epoch_1000_succ_100.pth"
# Lift (Proficient Human)
urllib.request.urlretrieve(
"http://downloads.cs.stanford.edu/downloads/rt_benchmark/model_zoo/lift/bc_rnn/lift_ph_low_dim_epoch_1000_succ_100.pth",
filename=ckpt_path
)
assert os.path.exists(ckpt_path)
3. 加载trained policy
调用 policy_from_checkpoint 函数,从 checkpoint 中构建正确的模型并加载训练好的权重,也可以手动加载 checkpoint
device = TorchUtils.get_torch_device(try_to_use_cuda=True)
# restore policy
policy, ckpt_dict = FileUtils.policy_from_checkpoint(ckpt_path=ckpt_path, device=device, verbose=True)
4. 创建 rollout 环境
此处简单的说明一下 rollout,直接翻译为 “推演或者模拟”,通常是智能体和环境以及模型交互的过程中产生的一系列的交互历史轨迹,通过收集数据,来评估或改进当前的策略
一个 rollout 可以包含一个或多个完整的episodes,或者只是一个episode的一部分(在实际应用中,通常一个rollout只包含一个episode的数据)
policy checkpoint 包含足够的信息来重新创建训练它的环境(也可以手动创建环境)
# create environment from saved checkpoint
env, _ = FileUtils.env_from_checkpoint(
ckpt_dict=ckpt_dict,
render=False, # we won't do on-screen rendering in the notebook
render_offscreen=True, # render to RGB images for video
verbose=True,
)
5. 定义 rollout 循环
定义主 rollout 循环,该循环将运行 policy 到目标 horizon,并将 rollout 写入视频(可选)
def rollout(policy, env, horizon, render=False, video_writer=None, video_skip=5, camera_names=None):
"""
Helper function to carry out rollouts. Supports on-screen rendering, off-screen rendering to a video,
and returns the rollout trajectory.
Args:
policy (instance of RolloutPolicy): policy loaded from a checkpoint
env (instance of EnvBase): env loaded from a checkpoint or demonstration metadata
horizon (int): maximum horizon for the rollout
render (bool): whether to render rollout on-screen
video_writer (imageio writer): if provided, use to write rollout to video
video_skip (int): how often to write video frames
camera_names (list): determines which camera(s) are used for rendering. Pass more than
one to output a video with multiple camera views concatenated horizontally.
Returns:
stats (dict): some statistics for the rollout - such as return, horizon, and task success
"""
assert isinstance(env, EnvBase)
assert isinstance(policy, RolloutPolicy)
assert not (render and (video_writer is not None))
policy.start_episode()
obs = env.reset()
state_dict = env.get_state()
# hack that is necessary for robosuite tasks for deterministic action playback
obs = env.reset_to(state_dict)
results = {}
video_count = 0 # video frame counter
total_reward = 0.
try:
for step_i in range(horizon):
# get action from policy
act = policy(ob=obs)
# play action
next_obs, r, done, _ = env.step(act)
# compute reward
total_reward += r
success = env.is_success()["task"]
# visualization
if render:
env.render(mode="human", camera_name=camera_names[0])
if video_writer is not None:
if video_count % video_skip == 0:
video_img = []
for cam_name in camera_names:
video_img.append(env.render(mode="rgb_array", height=512, width=512, camera_name=cam_name))
video_img = np.concatenate(video_img, axis=1) # concatenate horizontally
video_writer.append_data(video_img)
video_count += 1
# break if done or if success
if done or success:
break
# update for next iter
obs = deepcopy(next_obs)
state_dict = env.get_state()
except env.rollout_exceptions as e:
print("WARNING: got rollout exception {}".format(e))
stats = dict(Return=total_reward, Horizon=(step_i + 1), Success_Rate=float(success))
return stats
6. 运行policy
此处请仔细看,和 terminal 运行指令相同,赋予参数
rollout_horizon = 400
np.random.seed(0)
torch.manual_seed(0)
video_path = "rollout.mp4"
video_writer = imageio.get_writer(video_path, fps=20)
运行主函
stats = rollout(
policy=policy,
env=env,
horizon=rollout_horizon,
render=False,
video_writer=video_writer,
video_skip=5,
camera_names=["agentview"]
)
print(stats)
video_writer.close()
7. 可视化 rollout
from IPython.display import Video
Video(video_path)
8.运行结果
下载了数据集,生成了视频,看一下是夹取动作