实现批量自动文本标注（输出标签）代码复现

一：项目地址：

IDEA-Research/Grounded-Segment-Anything: Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything (github.com)

二：下载代码

方式一：下载安装包

方式二：git clone（强烈建议采用这种方式）

git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git

三：创建虚拟环境

 conda create -n label python=3,8

然后创建一个文件夹，输入第二步方式二的命令，下载文件

四：打开IDE工具

我们找到合适自己的IDE工具来对代码进行调试和分析，我这边演示的是用pycharm

4.1设置虚拟环境

根据下面的步骤，点击ok

然后记得激活在pycharm中的虚拟环境

4.2安装pytorch

官网地址：PyTorch

选择和自己ucda版本相匹配的pytorch，我的cuda是11.8

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

4.3安装

cd  Grounded-Segment-Anything

python -m pip install -e segment_anything

输入下面这个命令之前，记得pytorch一定要先安装，不然会报错

pip install --no-build-isolation -e GroundingDINO

git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/

pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

cd Grounded-Segment-Anything
git submodule init
git submodule update

下面还需要下载四个模型

 https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth
https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/tag2text_swin_14m.pth

直接把链接输入到浏览器中，会自动下载

下载一个包

pip install litellm

pip install nltk

pip install --upgrade transformers

4.4报错解决

在这个过程中可能会遇到下面这种情况

(label) D:\Desktop\text\Grounded-Segment-Anything>python automatic_label_ram_demo.py
Traceback (most recent call last):
  File "automatic_label_ram_demo.py", line 28, in <module>
    from ram.models import ram
ModuleNotFoundError: No module named 'ram'

解决方法

import sys
import os

# 获取 'recognize-anything' 目录的路径
recognize_anything_dir = os.path.join(os.path.dirname(__file__), 'recognize-anything')

# 将 'recognize-anything' 目录添加到 Python 解释器的搜索路径中
sys.path.append(recognize_anything_dir)

# 现在可以导入 ram 模块了
from ram.models import ram

这样，Python 解释器就会将 recognize-anything 目录加入到搜索路径中，使得你的程序能够正确地导入 ram 模块。请确保这段代码位于你的 automatic_label_ram_demo.py 文件的顶部。

这个是官方给的代码，我觉得是个坑，命令行参数不能这样子输入，要变成一整行去输入

export CUDA_VISIBLE_DEVICES=0
python automatic_label_ram_demo.py \
  --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
  --ram_checkpoint ram_swin_large_14m.pth \
  --grounded_checkpoint groundingdino_swint_ogc.pth \
  --sam_checkpoint sam_vit_h_4b8939.pth \
  --input_image assets/demo9.jpg \
  --output_dir "outputs" \
  --box_threshold 0.25 \
  --text_threshold 0.2 \
  --iou_threshold 0.5 \
  --device "cuda"

正确示例

python automatic_label_ram_demo.py --config=GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py --ram_checkpoint=ram_swi
n_large_14m.pth --grounded_checkpoint=groundingdino_swint_ogc.pth --sam_checkpoint=sam_vit_h_4b8939.pth --input_image="D:\Desktop\text\Grounded-Segment-Anything\bird.jpg" --output_dir="outputs" --box_threshold=0.25 --text_threshold=0.2 --iou_threshold=0.5 --device="cuda"

python automatic_label_ram_demo.py --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py --ram_checkpoint ram_swin_large_14m.pth --grounded_checkpoint groundingdino_swint_ogc.pth --sam_checkpoint sam_vit_h_4b8939.pth --input_image assets/demo9.jpg --output_dir "outputs" --box_threshold 0.25 --text_threshold 0.2 --iou_threshold 0.5 --device "cuda"