安装Extension
本地安装Remote-SSH、python
远程服务器上安装Python
- 难点:主机和远程服务器上安装Python扩展失败,可能是网络、代理等原因导致
- 解决方法:
- 主机在官方网站下载Python扩展:https://marketplace.visualstudio.com/items?itemName=ms-python.python
主机直接放在vscode的bin目录下并且执行指令code --install-extension ms-python.python-2022.9.11681004.vsix
即可
(细节见https://www.hangge.com/blog/cache/detail_3191.html) - 服务器的python扩展先使用scp从本地传上去,然后先要对其赋予执行权限,我一开始没有解决就是因为没有赋予权限,我直接chmod 777之后install from vsix即可(chmod +x应该也行)
之后就看到环境了:
现在可以选择自己在服务器的conda进行调试:
价值一天半时间的”权限访问“难题被破解!此时不禁想要听一百遍越权访问加深印象…
- 主机在官方网站下载Python扩展:https://marketplace.visualstudio.com/items?itemName=ms-python.python
之后就要run->add configuration->
launch.json如下:
{
"version": "0.2",
"configurations": [
{
"name": "Python: Launch",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/CLIP4Clip/main_task_retrieval.py",
"args": [
"--do_train",
"--num_thread_reader=0",
"--epochs=5",
"--batch_size=128",
"--n_display=50",
"--train_csv",
"${env:DATA_PATH}/MSRVTT_train.9k.csv",
"--val_csv",
"${env:DATA_PATH}/MSRVTT_JSFUSION_test.csv",
"--data_path",
"${env:DATA_PATH}/MSRVTT_data.json",
"--features_path",
"${env:DATA_PATH}/MSRVTT_Videos",
"--output_dir",
"ckpts/ckpt_msrvtt_retrieval_looseType",
"--lr",
"1e-4",
"--max_words",
"32",
"--max_frames",
"12",
"--batch_size_val",
"16",
"--datatype",
"msrvtt",
"--expand_msrvtt_sentences",
"--feature_framerate",
"1",
"--coef_lr",
"1e-3",
"--freeze_layer_num",
"0",
"--slice_framepos",
"2",
"--loose_type",
"--linear_patch",
"2d",
"--sim_header",
"meanP",
"--pretrained_clip_name",
"ViT-B/32"
],
"env": {
"DATA_PATH": "/mnt/cloud_disk/wf/msrvtt_data"
},
"console": "integratedTerminal"
}
]
}
之后出现一个问题就是目前引用env变量在命令行中显示为空,目前不能用这个方式引用所以还得用笨方法,就是挨个复制粘贴。
并且python -m要变成module词段,module与program冲突,需要调整:
{
"version": "0.2",
"configurations": [
{
"name": "Python: Launch",
"type": "python",
"request": "launch",
"module": "torch.distributed.launch",
"args": [
"${workspaceFolder}/CLIP4Clip/main_task_retrieval.py",
"--do_train",
"--num_thread_reader=0",
"--epochs=5",
"--batch_size=128",
"--n_display=50",
"--train_csv",
"/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_train.9k.csv",
"--val_csv",
"/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_JSFUSION_test.csv",
"--data_path",
"/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_data.json",
"--features_path",
"/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_Videos",
"--output_dir",
"ckpts/ckpt_msrvtt_retrieval_looseType",
"--lr",
"1e-4",
"--max_words",
"32",
"--max_frames",
"12",
"--batch_size_val",
"16",
"--datatype",
"msrvtt",
"--expand_msrvtt_sentences",
"--feature_framerate",
"1",
"--coef_lr",
"1e-3",
"--freeze_layer_num",
"0",
"--slice_framepos",
"2",
"--loose_type",
"--linear_patch",
"2d",
"--sim_header",
"meanP",
"--pretrained_clip_name",
"ViT-B/32"
],
"console": "integratedTerminal"
}
]
}
之后设置断点调试之后发现这个问题:
挨个语句调试之后发现出现在某个加载模型的地方,模型的位置防止错误了,远程调试真的好用,可以清晰看到过程的调用栈call stack
发现以下问题:
在这段程序中计算frameCount的时候我发现计算出来的为0,fps也为0,因此引发了除零报错