网站:
GitHub - jerpelhan/DAVE
下载完以后,阅读 readme文件
新建终端,打印文件树,不包含隐藏文件:
命令:tree -I '.*'
.
├── LICENSE
├── README.md
├── demo.py
├── demo_zero.py
├── main.py
├── material
│ ├── 458.jpg
│ ├── 7.jpg
│ ├── 7707.jpg
│ ├── __init__.py
│ ├── arch.png
│ └── qualitative.png
├── models
│ ├── __init__.py
│ ├── backbone.py
│ ├── box_prediction.py
│ ├── dave.py
│ ├── dave_tr.py
│ ├── feat_comparison.py
│ ├── positional_encoding.py
│ ├── regression_head.py
│ └── transformer.py
├── scripts
│ ├── fscd_0_test.sh
│ ├── fscd_0shot_clip.sh
│ ├── fscd_1_test.sh
│ ├── fscd_lvis_test.sh
│ ├── fscd_lvis_unseen_test.sh
│ ├── fscd_multicat.sh
│ ├── fscd_test.sh
│ ├── train_det.sh
│ └── train_sim.sh
├── train_det.py
├── train_similarity.py
└── utils
├── __init__.py
├── arg_parser.py
├── data.py
├── data_lvis.py
├── eval.py
├── helpers.py
└── losses.py5 directories, 38 files
(1)新建环境,避免包冲突
选择安装的 dave 环境
(2)pip 依赖项
(3)下载数据文件&模型文件,并保证路径正确
(4)运行 main.py
第一个错误:AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
所有 gpu 的地方,全部改成 cpu
改动 1
改前:
改后:
改动 2
去掉并行计算
目前所有的修改都在 main》》evaluate函数
第一个改动的函数:
def evaluate(args):
device = torch.device("cpu")
第二个改动的函数:
def eval_0shot(args):
print("0shot")
if args.skip_test:
return
args.zero_shot = True
device = torch.device("cpu")
第三个改动的函数:
def eval_0shot_multicat(args):
args.zero_shot = True
device = torch.device("cpu")
第三个改动的函数:
def evaluate_LVIS(args):
device = torch.device("cpu")
第四个改动的函数:
def evaluate_multicat(args):
device = torch.device("cpu")
修改完所有的设备(gpu→cpu)以及并行计算
第二个错误:FileNotFoundError: [Errno 2] No such file or directory: 'material/.pth'
代码:
if __name__ == '__main__': print("DAVE") parser = argparse.ArgumentParser('DAVE', parents=[get_argparser()]) args = parser.parse_args() print(args)
根据代码和文件结构,断点加在 parser 那行
from utils.arg_parser import get_argparser
下载下来,复制进去
步进再单步调试,修改文件路径
修改后,改为相对路径:
数据文件中的内容:
路径修改完毕,再次尝试运行
还是报错:FileNotFoundError: [Errno 2] No such file or directory: 'material/.pth'
断点
再调试,结合后面的条件控制语句,单步打印判断结果,找到错误:
加断点,F9 切换激活断点、继续切换到该断点
控制台抛出warning :
(1)提示'pretrained' is deprecated 这个属性被废除
/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
(2)
warnings.warn(
/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
(3)
warnings.warn(msg)
/Users/dearr/Downloads/DAVE-master 3/main.py:35: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
(4)
torch.load(os.path.join(args.model_path, args.model_name + '.pth'))['model'], strict=False
(5)打印的print
Namespace(aux_weight=0.3, backbone='resnet18', backbone_lr=0, batch_size=4, count_loss_weight=0, d_s=1.0, d_t=3, data_path='data', dataset='fsc147', det_model_name='DAVE', det_train=False, detection_loss_weight=0.01, dropout=0.1, egv=0.132, emb_dim=256, epochs=200, eval_multicat=False, fcos_pred_size=512, i_thr=0.55, image_size=512, kernel_dim=3, lr=0.0001, lr_drop=200, m_s=0.0, max_grad_norm=0.1, min_count_loss_weight=0, model_name='', model_path='material/', norm_s=False, normalized_l2=False, num_dec_layers=3, num_enc_layers=3, num_heads=8, num_objects=3, num_workers=12, orig_dmaps=False, pre_norm=False, prompt_shot=False, reduction=8, resume_training=False, s_t=0.008, skip_cars=False, skip_test=False, skip_train=False, swav_backbone=False, task='fscd147', tiling_p=0.5, unseen=False, use_appearance=False, use_objectness=False, use_query_pos_emb=False, weight_decay=0.0001, zero_shot=False)
设置断点,逐步调试:
定位到出错行数,打印调试,model_name=''所以报错,那具体怎么改呢?
看源代码公开主页,demo 给的命令,demo.py是可以正确运行的,所以照搬照抄。
python demo.py --skip_train --model_name DAVE_3_shot --model_path material --backbone resnet50 --swav_backbone --reduction 8 --num_enc_layers 3 --num_dec_layers 3 --kernel_dim 3 --emb_dim 256 --num_objects 3 --num_workers 8 --use_query_pos_emb --use_objectness --use_appearance --batch_size 1 --pre_norm
第三个错误:RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
问问 copilot,大概也是 torch.load报错,加一个参数即可
# model.load_state_dict(
# torch.load(os.path.join(args.model_path, args.model_name + '.pth'))['model'], strict=False
# )# 路径报错已修改,设备报错修改为:
model.load_state_dict(
torch.load(os.path.join(args.model_path, args.model_name + '.pth'), map_location=torch.device('cpu'))['model'], strict=False
)
# pretrained_dict_feat = {k.split("feat_comp.")[1]: v for k, v in
# torch.load(os.path.join(args.model_path, 'verification.pth'))[
# 'model'].items() if 'feat_comp' in k}
# torch.load报错,修改为,加参数即可:
pretrained_dict_feat = {k.split("feat_comp.")[1]: v for k, v in
torch.load(os.path.join(args.model_path, 'verification.pth'), map_location=torch.device('cpu'))[
'model'].items() if 'feat_comp' in k}
model.module.feat_comp.load_state_dict(pretrained_dict_feat)
第四个错误:AttributeError: 'COTR' object has no attribute 'module'
并行化的错误,我没有并行计算
# model.module.feat_comp.load_state_dict(pretrained_dict_feat) 修改为:
model.feat_comp.load_state_dict(pretrained_dict_feat)
第五个错误:RuntimeError: stack expects each tensor to be equal size, but got [1, 400, 248] at entry 0 and [1, 512, 512] at entry 1 libc++abi: terminating due to uncaught exception of type std::__1::system_error: Broken pipe
构思入手点
(1)问问 copilot
(2)加断点调试
(3)仔细看看控制台输出
(4)查看源代码公开主页的讨论区,是否有遇到类似错误的
GitHub - jerpelhan/DAVE
【issue】归纳-CSDN博客
源代码公开讨论区问题整理
(1)copilot 给出张量尺寸问题
(2)加断点,断点加在哪里?看看控制台输出,大概有谱
控制台输出解读 控制台输出解读-CSDN博客
修改一个警告信息
# parser.add_argument('--num_workers', default=12, type=int) 修改为:
parser.add_argument('--num_workers', default=8, type=int)
理由:
Warning: This DataLoader will create 12 worker processes in total. Our suggested max number of worker in current system is 8 (`cpuset` is not taken into account), which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
忽略所有的警告信息
import warnings
warnings.filterwarnings("ignore")
程序内断点
import pdb;pdb.set_trace()
确实是 for 循环出问题了,也确实是不会改。
(dave) (base) dearr@dearrdeMacBook-Air DAVE-master 3 % /Users/de
arr/anaconda3/envs/dave/bin/python "/Users/dearr/Downloads/DAVE-
master 3/main.py"
val
1286
loading annotations into memory...
Done (t=0.17s)
creating index...
index created!
Traceback (most recent call last):
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 147, in <module>
evaluate(args)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 73, in evaluate
test_loader_1 = [next(iter(test_loader)) for _ in range(2)]
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 73, in <listcomp>
test_loader_1 = [next(iter(test_loader)) for _ in range(2)]
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
return self._process_data(data)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
data.reraise()
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/_utils.py", line 706, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
return self.collate_fn(data)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 317, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 174, in collate
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 174, in <listcomp>
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 142, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 214, in collate_tensor_fn
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [1, 400, 248] at entry 0 and [1, 512, 512] at entry 1
第六个错误:RuntimeError: Caught RuntimeError in DataLoader worker process 0.
尝试修改第一个错误
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
把进程数改成 1
parser.add_argument('--num_workers', default=1, type=int)
输出:
(dave) (base) dearr@dearrdeMacBook-Air DAVE-master 3 % /Users/dearr/anaconda3/envs/dave/bin/python "/
Users/dearr/Downloads/DAVE-master 3/main.py"
val
1286
loading annotations into memory...
Done (t=0.16s)
creating index...
index created!
Traceback (most recent call last):
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 147, in <module>
evaluate(args)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 73, in evaluate
test_loader_1 = [next(iter(test_loader)) for _ in range(2)]
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 73, in <listcomp>
test_loader_1 = [next(iter(test_loader)) for _ in range(2)]
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
return self._process_data(data)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
data.reraise()
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/_utils.py", line 706, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
return self.collate_fn(data)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 317, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 174, in collate
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 174, in <listcomp>
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 142, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 214, in collate_tensor_fn
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [1, 400, 248] at entry 0 and [1, 512, 512] at entry 1
改了,没用,问问。
改成 0,输出:
(dave) (base) dearr@dearrdeMacBook-Air DAVE-master 3 % /Users/de
arr/anaconda3/envs/dave/bin/python "/Users/dearr/Downloads/DAVE-
master 3/main.py"
val
1286
loading annotations into memory...
Done (t=0.16s)
creating index...
index created!
Traceback (most recent call last):
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 147, in <module>
evaluate(args)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 73, in evaluate
test_loader_1 = [next(iter(test_loader)) for _ in range(2)]
File "/Users/dearr/Downloads/DAVE-master 3/main.py", line 73, in <listcomp>
test_loader_1 = [next(iter(test_loader)) for _ in range(2)]
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 673, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
return self.collate_fn(data)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 317, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 174, in collate
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 174, in <listcomp>
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 142, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/Users/dearr/anaconda3/envs/dave/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 214, in collate_tensor_fn
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [1, 400, 248] at entry 0 and [1, 512, 512] at entry 1
改成 0 以后,这个错误确实是没有了
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
继续修改第五个错误
真想不到了,加断点调试进不去循环,只知道是循环出问题了,具体哪里的问题,进不去代码内部,不管怎么加断点都不行。
分析控制台输出:这个 1286 是怎么打印出来了。问了以后,是 data.py 中有 print(len())