Linux 36.3 + JetPack v6.0@jetson-inference之图像分类
- 1. 源由
- 2. imagenet
- 2.1 命令选项
- 2.2 下载模型
- 2.3 操作示例
- 2.3.1 单张照片
- 2.3.2 视频
- 3. 代码
- 3.1 Python
- 3.2 C++
- 4. 参考资料
- 5. 补充
- 5.1 第一次运行模型本地适应初始化
- 5.2 samba软连接
1. 源由
从应用角度来说,图像分类是计算机视觉里面最基本的一个操作。
2. imagenet
imageNet对象接受输入图像并输出每个类别的概率。GoogleNet和ResNet-18模型在构建过程中自动下载,这些模型已在包含1000个物体的ImageNet ILSVRC数据集上进行了训练。
2.1 命令选项
$ imagenet --help
usage: imagenet [--help] [--network=NETWORK] ...
input_URI [output_URI]
Classify a video/image stream using an image recognition DNN.
See below for additional arguments that may not be shown above.
optional arguments:
--help show this help message and exit
--network=NETWORK pre-trained model to load (see below for options)
--topK=N show the topK number of class predictions (default: 1)
positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)
imageNet arguments:
--network=NETWORK pre-trained model to load, one of the following:
* alexnet
* googlenet (default)
* googlenet-12
* resnet-18
* resnet-50
* resnet-101
* resnet-152
* vgg-16
* vgg-19
* inception-v4
--model=MODEL path to custom model to load (caffemodel, uff, or onnx)
--prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
--labels=LABELS path to text file containing the labels for each class
--input-blob=INPUT name of the input layer (default is 'data')
--output-blob=OUTPUT name of the output layer (default is 'prob')
--threshold=CONF minimum confidence threshold for classification (default is 0.01)
--smoothing=WEIGHT weight between [0,1] or number of frames (disabled by default)
--profile enable layer profiling in TensorRT
videoSource arguments:
input resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* webrtc://@:1234/my_stream (WebRTC stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
--input-width=WIDTH explicitly request a width of the stream (optional)
--input-height=HEIGHT explicitly request a height of the stream (optional)
--input-rate=RATE explicitly request a framerate of the stream (optional)
--input-save=FILE path to video file for saving the input stream to disk
--input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
--input-decoder=TYPE the decoder engine to use, one of these:
* cpu
* omx (aarch64/JetPack4 only)
* v4l2 (aarch64/JetPack5 only)
--input-flip=FLIP flip method to apply to input:
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
--input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don't loop (default)
* >0 = set number of loops
videoOutput arguments:
output resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* rtsp://@:8554/my_stream (RTSP stream)
* webrtc://@:1234/my_stream (WebRTC stream)
* display://0 (OpenGL window)
--output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
--output-encoder=TYPE the encoder engine to use, one of these:
* cpu
* omx (aarch64/JetPack4 only)
* v4l2 (aarch64/JetPack5 only)
--output-save=FILE path to a video file for saving the compressed stream
to disk, in addition to the primary output above
--bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
--headless don't create a default OpenGL GUI window
logging arguments:
--log-file=FILE output destination file (default is stdout)
--log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
--verbose enable verbose logging (same as --log-level=verbose)
--debug enable debug logging (same as --log-level=debug)
注:关于照片、视频等基本操作,详见: 《Linux 36.3 + JetPack v6.0@jetson-inference之视频操作》
2.2 下载模型
两种方式:
- 创建imageNet对象时,初始化会自动下载
- 通过手动将模型文件放置到
data/networks/
目录下
国内,由于“墙”的存在,对于我们这种处于起飞阶段的菜鸟来说就是“障碍”。有条件的朋友可以参考《apt-get通过代理更新系统》进行设置网络。
不过,NVIDIA还是很热心的帮助我们做了“Work around”,所有的模型都已经预先存放在中国大陆能访问的位置:Github - model-mirror-190618
--network=NETWORK pre-trained model to load, one of the following:
* alexnet
* googlenet (default)
* googlenet-12
* resnet-18
* resnet-50
* resnet-101
* resnet-152
* vgg-16
* vgg-19
* inception-v4
--model=MODEL path to custom model to load (caffemodel, uff, or onnx)
根据以上Model方面信息,该命令支持:
- alexnet
- googlenet (default)
- googlenet-12
- resnet-18
- resnet-50
- resnet-101
- resnet-152
- vgg-16
- vgg-19
- inception-v4
- 支持定制模型(需要用到通用的模型文件caffemodel, uff, or onnx)
作为示例,就下载一个googlenet (default)模型
$ mkdir model-mirror-190618
$ cd model-mirror-190618
$ wget https://github.com/dusty-nv/jetson-inference/releases/download/model-mirror-190618/GoogleNet.tar.gz
$ mkdir -p ../data/networks/Googlenet
$ tar -zxvf GoogleNet.tar.gz -C ../data/networks/Googlenet
$ cd ..
注:这个模型文件下载要注意,将解压缩文件放置到Googlenet目录下。
2.3 操作示例
它加载图像(或多张图像),使用TensorRT和imageNet类进行推理,然后叠加分类结果并保存输出图像。该项目附带了供您使用的示例图像,这些图像位于images/目录下。
- What’s wrong with imagenet, continous printf?
$ cd build/aarch64/bin/
2.3.1 单张照片
# C++
$ ./imagenet images/orange_0.jpg images/test/output_imagenet_cpp.jpg
# Python
$ ./imagenet.py images/strawberry_0.jpg images/test/output_imagenet_python.jpg
2.3.2 视频
# Download test video (thanks to jell.yfish.us)
$ wget https://nvidia.box.com/shared/static/tlswont1jnyu3ix2tbf7utaekpzcx4rc.mkv -O jellyfish.mkv
# C++
$ ./imagenet --network=resnet-18 ../../../jellyfish.mkv images/test/output_imagenet_jellyfish_cpp.mkv
# Python
$ ./imagenet.py --network=resnet-18 ../../../jellyfish.mkv images/test/output_imagenet_jellyfish_python.mkv
这里视频就放一份了,理论上将既然有概率性的问题求解方式,不同时间运算的结果可能会有差异。但是基于这个模型,计算机没有记忆,所以理论上是同一个概率。
那么问题来了,照片的CPP和Python两次运算概率确是是不一样的。这是什么原因呢?
output_imagenet_jellyfish_cpp
3. 代码
3.1 Python
Import statements
├── sys
├── argparse
├── jetson_inference
│ └── imageNet
└── jetson_utils
├── videoSource
├── videoOutput
├── cudaFont
└── Log
Command line parsing
├── Create ArgumentParser
│ ├── description
│ ├── formatter_class
│ └── epilog
├── Add arguments
│ ├── input
│ ├── output
│ ├── --network
│ └── --topK
└── Parse arguments
├── try
│ └── args = parser.parse_known_args()[0]
└── except
├── print("")
├── parser.print_help()
└── sys.exit(0)
Load the recognition network
└── net = imageNet(args.network, sys.argv)
Optional hard-coded model loading (commented out)
└── net = imageNet(model="model/resnet18.onnx", labels="model/labels.txt",
input_blob="input_0", output_blob="output_0")
Create video sources & outputs
├── input = videoSource(args.input, argv=sys.argv)
├── output = videoOutput(args.output, argv=sys.argv)
└── font = cudaFont()
Process frames until EOS or user exits
└── while True
├── Capture the next image
│ ├── img = input.Capture()
│ └── if img is None
│ └── continue
├── Classify the image and get the topK predictions
│ └── predictions = net.Classify(img, topK=args.topK)
├── Draw predicted class labels
│ └── for n, (classID, confidence) in enumerate(predictions)
│ ├── classLabel = net.GetClassLabel(classID)
│ ├── confidence *= 100.0
│ ├── print(f"imagenet: {confidence:05.2f}% class #{classID} ({classLabel})")
│ └── font.OverlayText(img, text=f"{confidence:05.2f}% {classLabel}",
│ x=5, y=5 + n * (font.GetSize() + 5),
│ color=font.White, background=font.Gray40)
├── Render the image
│ └── output.Render(img)
├── Update the title bar
│ └── output.SetStatus("{:s} | Network {:.0f} FPS".format(net.GetNetworkName(), net.GetNetworkFPS()))
├── Print out performance info
│ └── net.PrintProfilerTimes()
└── Exit on input/output EOS
└── if not input.IsStreaming() or not output.IsStreaming()
└── break
3.2 C++
#include statements
├── "videoSource.h"
├── "videoOutput.h"
├── "cudaFont.h"
├── "imageNet.h"
└── <signal.h>
Global variables
└── bool signal_recieved = false;
Function definitions
├── void sig_handler(int signo)
│ └── if (signo == SIGINT)
│ ├── LogVerbose("received SIGINT\n");
│ └── signal_recieved = true;
└── int usage()
├── printf("usage: imagenet [--help] [--network=NETWORK] ...\n");
├── printf(" input_URI [output_URI]\n\n");
├── printf("Classify a video/image stream using an image recognition DNN.\n");
├── printf("See below for additional arguments that may not be shown above.\n\n");
├── printf("optional arguments:\n");
├── printf(" --help show this help message and exit\n");
├── printf(" --network=NETWORK pre-trained model to load (see below for options)\n");
├── printf(" --topK=N show the topK number of class predictions (default: 1)\n");
├── printf("positional arguments:\n");
├── printf(" input_URI resource URI of input stream (see videoSource below)\n");
├── printf(" output_URI resource URI of output stream (see videoOutput below)\n\n");
├── printf("%s", imageNet::Usage());
├── printf("%s", videoSource::Usage());
├── printf("%s", videoOutput::Usage());
└── printf("%s", Log::Usage());
main function
├── Parse command line
│ ├── commandLine cmdLine(argc, argv);
│ └── if (cmdLine.GetFlag("help"))
│ └── return usage();
├── Attach signal handler
│ └── if (signal(SIGINT, sig_handler) == SIG_ERR)
│ └── LogError("can't catch SIGINT\n");
├── Create input stream
│ ├── videoSource* input = videoSource::Create(cmdLine, ARG_POSITION(0));
│ └── if (!input)
│ ├── LogError("imagenet: failed to create input stream\n");
│ └── return 1;
├── Create output stream
│ ├── videoOutput* output = videoOutput::Create(cmdLine, ARG_POSITION(1));
│ └── if (!output)
│ ├── LogError("imagenet: failed to create output stream\n");
│ └── return 1;
├── Create font for image overlay
│ ├── cudaFont* font = cudaFont::Create();
│ └── if (!font)
│ ├── LogError("imagenet: failed to load font for overlay\n");
│ └── return 1;
├── Create recognition network
│ ├── imageNet* net = imageNet::Create(cmdLine);
│ └── if (!net)
│ ├── LogError("imagenet: failed to initialize imageNet\n");
│ └── return 1;
│ ├── const int topK = cmdLine.GetInt("topK", 1); // default top result
├── Processing loop
│ └── while (!signal_recieved)
│ ├── uchar3* image = NULL;
│ ├── int status = 0;
│ ├── if (!input->Capture(&image, &status))
│ │ └── if (status == videoSource::TIMEOUT)
│ │ └── continue;
│ │ └── break; // EOS
│ ├── imageNet::Classifications classifications; // classID, confidence
│ ├── if (net->Classify(image, input->GetWidth(), input->GetHeight(), classifications, topK) < 0)
│ │ └── continue;
│ ├── for (uint32_t n=0; n < classifications.size(); n++)
│ │ ├── const uint32_t classID = classifications[n].first;
│ │ ├── const char* classLabel = net->GetClassLabel(classID);
│ │ ├── const float confidence = classifications[n].second * 100.0f;
│ │ ├── LogVerbose("imagenet: %2.5f%% class #%i (%s)\n", confidence, classID, classLabel);
│ │ ├── char str[256];
│ │ ├── sprintf(str, "%05.2f%% %s", confidence, classLabel);
│ │ └── font->OverlayText(image, input->GetWidth(), input->GetHeight(),
│ │ str, 5, 5 + n * (font->GetSize() + 5),
│ │ make_float4(255,255,255,255), make_float4(0,0,0,100));
│ ├── if (output != NULL)
│ │ ├── output->Render(image, input->GetWidth(), input->GetHeight());
│ │ ├── char str[256];
│ │ ├── sprintf(str, "TensorRT %i.%i.%i | %s | Network %.0f FPS", NV_TENSORRT_MAJOR, NV_TENSORRT_MINOR, NV_TENSORRT_PATCH, net->GetNetworkName(), net->GetNetworkFPS());
│ │ └── output->SetStatus(str);
│ │ └── if (!output->IsStreaming())
│ │ └── break;
│ └── net->PrintProfilerTimes();
├── Destroy resources
│ ├── LogVerbose("imagenet: shutting down...\n");
│ ├── SAFE_DELETE(input);
│ ├── SAFE_DELETE(output);
│ ├── SAFE_DELETE(net);
└── LogVerbose("imagenet: shutdown complete.\n");
return 0;
4. 参考资料
【1】jetson-inference - Classifying Images with ImageNet
5. 补充
5.1 第一次运行模型本地适应初始化
第一次运行神经网络,虽然模型是预训练的,但是本地部署还是有个初始化过程,好像是建立一些cache的过程,具体有待进一步研究。
注:有知道为什么是这样,也请评论区告诉我,谢谢!
- imagenet can’t work as readme says, see attached log #1858
- could not find engine cache … MonoDepth-FCN-Mobilenet/monodepth_fcn_mobilenet.onnx.1.1.8602.GPU.FP16.engine ? #1855
- What’s wrong with imagenet/detectnet, continous printf?
5.2 samba软连接
注:share请替换为samba共享目录,比如:home
- ubuntu22.04 配置
[global]
allow insecure wide links = yes
[share]
follow symlinks = yes
wide links = yes
- 之前的版本
[global]
unix extensions = no
[share]
follow symlinks = yes
wide links = yes