Patch-Wise Graph Contrastive Learning for Image Translation

Patch-Wise Graph Contrastive Learning for Image Translation
图像翻译中的逐块图对比学习

Chanyong Jung1, Gihyun Kwon1, Jong Chul Ye1, 2
Chanyong Jung,Gihyun Kwon,Jong Chul Ye 1, 2
Abstract 摘要                      Patch-Wise Graph Contrastive Learning for Image Translation

Recently, patch-wise contrastive learning is drawing attention for the image translation by exploring the semantic correspondence between the input and output images. To further explore the patch-wise topology for high-level semantic understanding, here we exploit the graph neural network to capture the topology-aware features. Specifically, we construct the graph based on the patch-wise similarity from a pretrained encoder, whose adjacency matrix is shared to enhance the consistency of patch-wise relation between the input and the output. Then, we obtain the node feature from the graph neural network, and enhance the correspondence between the nodes by increasing mutual information using the contrastive loss. In order to capture the hierarchical semantic structure, we further propose the graph pooling. Experimental results demonstrate the state-of-art results for the image translation thanks to the semantic encoding by the constructed graphs.
近年来,块式对比学习通过探索输入图像和输出图像之间的语义对应关系,在图像翻译中引起了人们的关注。为了进一步探索用于高级语义理解的分片拓扑,在这里我们利用图神经网络来捕获拓扑感知特征。具体地说,我们从一个预训练的编码器,其邻接矩阵是共享的,以提高块之间的输入和输出的关系的一致性的补丁式的相似性的基础上构建的图。然后从图神经网络中提取节点特征,利用对比损失增加互信息,增强节点间的对应性。为了捕捉层次语义结构,我们进一步提出了图池。实验结果表明,由于所构造的图的语义编码的图像翻译的最先进的结果。

Introduction 介绍

Image-to-image translation task is a conditional image generation task in which the model converts the input image into target domain while preserving the content structure of the given input image. The seminar works of image translation models used paired training setting (Isola et al. 2017), or cycle-consistency training (Zhu et al. 2017) for content preservation. However, the models have disadvantages in that they require paired dataset or need complex training procedure with additional networks. To overcome the problems, later works introduced one-sided image translation by removing the cycle-consistency (Fu et al. 2019; Benaim and Wolf 2017).
图像到图像翻译任务是一种有条件的图像生成任务,其中模型将输入图像转换到目标域,同时保留给定输入图像的内容结构。图像翻译模型的研讨会工作使用配对训练设置(Isola et al. 2017)或循环一致性训练(Zhu et al. 2017)进行内容保存。然而,这些模型的缺点在于它们需要成对的数据集或需要使用额外的网络进行复杂的训练过程。为了克服这些问题,后来的研究通过去除周期一致性引入了单侧图像平移(Fu et al. 2019; Benaim and Wolf 2017)。

Refer to caption

Figure 1: The semantic connectivity of input is extracted by the encoder, and shared to construct the graph network. We maximize the mutual information between the nodes.
图1:编码器提取输入的语义连接,并共享以构建图网络。我们最大化节点之间的互信息。

Recently, inspired by the success of contrastive learning strategies, Contrastive Unpaired Translation (CUT) (Park et al. 2020) is proposed to enhance the correspondence between the input and the output images by the patch-wise contrastive learning. The patch-wise contrastive learning is further improved by exploring patch-wise relation such as adversarial hard negative samples (Wang et al. 2021), patch-wise similarity map (Zheng, Cham, and Cai 2021), or consistency regularization combined with hard negative mining by patch-wise relation (Jung, Kwon, and Ye 2022). Although these methods show meaningful improvement in the performance, they still have a limitation in that the previous works focused only on the individual point-wise matching for each pair, which does not consider the topology with the neighbors (Zhou et al. 2021).
最近,受对比学习策略成功的启发,提出了对比非配对翻译(CUT)(Park et al. 2020),通过分块对比学习来增强输入和输出图像之间的对应关系。通过探索分块关系,如对抗性硬负样本(Wang et al. 2021),分块相似性图(Zheng,Cham和Cai 2021),或通过分块关系与硬负挖掘相结合的一致性正则化(Jung,Kwon和Ye 2022),进一步改进了分块对比学习。尽管这些方法在性能上显示出有意义的改进,但它们仍然存在局限性,因为以前的工作仅关注每个对的单独逐点匹配,而没有考虑与邻居的拓扑(Zhou等人,2021)。

To further explore the semantic relationship between the patches, this paper considers image translation tasks as topology-aware representation learning as shown in Fig. 1. Specifically, we propose a novel framework based on the patch-wise graph constrastive learning using the Graph Neural Network (GNN) which is commonly used to extract the feature considering the topological structure.
为了进一步探索补丁之间的语义关系,本文将图像翻译任务视为拓扑感知表示学习,如图1所示。具体来说,我们提出了一种新的框架,基于补丁明智的图形学习使用图神经网络(GNN),这是常用的提取考虑拓扑结构的功能。

Several existing works have utilized GNN to capture topology-aware features for various tasks. Hierarchical representation with graph partitioning is proposed for the unsupervised segmentation (Melas-Kyriazi et al. 2022; Wang et al. 2022), and topology-aware representations (Han et al. 2022) are extracted based on semantic connectivity between image regions. For knowledge distillation, claimed the holistic knowledge (Zhou et al. 2021) between the data points is claimed, verifying its effectiveness to encode the topological knowledge of the teacher model.
一些现有的作品已经利用GNN来捕获各种任务的拓扑感知功能。针对无监督分割提出了具有图划分的分层表示(Melas—Kyriazi et al. 2022;Wang et al. 2022),并且基于图像区域之间的语义连接性提取拓扑感知表示(Han et al. 2022)。对于知识蒸馏,声称数据点之间的整体知识(Zhou et al. 2021),验证了其对教师模型拓扑知识编码的有效性。

Despite the great performance in various vision tasks, none of researches have explored the topology-aware features considering the implicit patch-wise semantic connection for the image-to-image translation tasks. Accordingly, here we employ GNN to utilize the patch-wise connection of input image as a prior knowledge for patch-wise contrastive learning. Specifically, we use a pre-trained network to extract the patch-wise features for the input and the output images. Then, we obtain the adjacency matrix calculated by the semantic relation between the patches of the input image, and share it for output image graph. We construct two graphs for the input and the output by the adjacency matrix and the patch features, and obtain the node features by the graph convolution. By maximize the mutual information (MI) between the nodes of input graph and output graph through the contrastive loss, we can enhance the correspondence of patches for the image translation task. Furthermore, to extract the semantic correspondence in a hierarchical manner, we propose to use the graph pooling technique that resembles the attention mechanism.
尽管在各种视觉任务中表现出色,但还没有研究探索考虑图像到图像翻译任务的隐式块式语义连接的拓扑感知特征。因此,在这里,我们采用GNN利用输入图像的分块连接作为分块对比学习的先验知识。具体来说,我们使用一个预先训练好的网络来提取输入和输出图像的分块特征。然后,我们得到的邻接矩阵计算的输入图像的补丁之间的语义关系,并共享它的输出图像图。通过邻接矩阵和分片特征构造输入和输出的两个图,并通过图卷积获得节点特征。通过对比度损失最大化输入图和输出图节点之间的互信息,可以增强图像翻译任务中块的对应性。 此外,为了以分层方式提取语义对应,我们建议使用类似于注意力机制的图池技术。

Our contributions can be summarized as follows:
我们的贡献可归纳如下:

  • • 

    We propose a GNN-based framework to capture topology-aware semantic representation by exploiting the patch-wise consistency between the input and translated output images.


    ·我们提出了一个基于GNN的框架,通过利用输入和翻译输出图像之间的分片一致性来捕获拓扑感知的语义表示。
  • • 

    We suggest a method to share the adjacency matrix in order to utilize the patch-wise connection of input image as a prior knowledge for the contrastive learning.


    ·我们提出了一种共享邻接矩阵的方法,以便利用输入图像的分块连接作为对比学习的先验知识。
  • • 

    To further exploit the hierarchical semantic relationship, we propose to use the graph pooling which provides a focused view for the graph.


    ·为了进一步利用分层语义关系,我们建议使用图池,它为图提供了一个集中的视图。
  • • 

    Experimental results in five different datasets demonstrates the state-of-the-art performance by producing semantically meaningful graphs.


    ·在五个不同数据集上的实验结果通过生成语义上有意义的图形来展示最先进的性能。

Related Works 相关作品

Patch-Wise Contrastive Learning for Images
图像的分块对比学习

In patch-level view, the image has diverse local semantics. The relational knowledge between the patches embodies the correlation between each region, and is utilized for various image generation tasks.
在块级视图中,图像具有多样的局部语义。块之间的关系知识体现了每个区域之间的相关性,并用于各种图像生成任务。

For example, patch-wise contrastive relation (Park et al. 2020; Wang et al. 2021) is utilized for the image translation. Similarly, patch similarity map obtained from pretrained encoder (Zheng, Cham, and Cai 2021) is suggested. Recently, patch-level self-correlation map (Zhan et al. 2022a), query selection module based on patch-wise similarity (Hu et al. 2022), optimal transport plan by patch-wise cost matrix (Zhan et al. 2022b) are suggested. Also, semantic relation consistency (Jung, Kwon, and Ye 2022) is proposed for the image translation tasks. Especially, for style transfer, patch-level relation extracted by vision transformer is recently proposed (Tumanyan et al. 2022; Bar-Tal et al. 2022). The methods utilized the relation between image tokens to preserve the regional correspondence. Recently, the consistency of the patch-wise semantic relation between the input and the output images was exploited to further improve the correspondence between the input and the output image (Jung, Kwon, and Ye 2022). For style transfer, the consistency of patch-level relation extracted by vision transformer was also studied (Tumanyan et al. 2022; Bar-Tal et al. 2022).
例如,分片对比关系(Park et al. 2020;Wang et al. 2021)用于图像平移。类似地,建议从预训练的编码器(Zheng,Cham和Cai 2021)获得补丁相似性图。最近,提出了块级自相关映射(Zhan et al.2022a)、基于块级相似性的查询选择模块(Hu et al.2022)、基于块级代价矩阵的最优运输计划(Zhan et al.2022b)。此外,语义关系一致性(Jung,Kwon和Ye 2022)被提出用于图像翻译任务。特别是,对于风格转移,最近提出了通过视觉Transformer提取的补丁级关系(Tumanyan et al. 2022;Bar—Tal et al. 2022)。该方法利用图像标记之间的关系来保持区域对应性。 最近,利用输入和输出图像之间的分块语义关系的一致性来进一步提高输入和输出图像之间的对应性(Jung,Kwon和Ye 2022)。对于风格转移,还研究了视觉Transformer提取的斑块级关系的一致性(Tumanyan et al. 2022; Bar-Tal et al. 2022)。

Refer to caption

Figure 2:(a) Overall framework of the proposed method. We impose patch-wise regularization by the GNN constructed by the encoder 𝐸. We extract the node feature 𝑍,𝑉 and maximize 𝐼⁢(𝑍;𝑉). Pooled graphs are utilized to focus on task-relevant nodes. (b) The motivation of the proposed approach to use patch-wise connection of input image as the prior knowledge.
图2:(a)拟议方法的总体框架。我们通过由编码器 𝐸 构造的GNN来施加分片正则化。我们提取节点特征 𝑍,𝑉 并最大化 𝐼⁢(𝑍;𝑉) 。池化图用于关注与任务相关的节点。(b)所提出的方法的动机,使用分块连接的输入图像作为先验知识。

Graph Neural Network 图神经网络

Graph neural network(GNN) learns the representation considering the connectivity of a graph-structured data (Kipf and Welling 2017; Du et al. 2017). Each node feature models the individual data and its relation to the other data points, aggregating the information from the neighbor nodes.
图神经网络(GNN)考虑图结构数据的连接性来学习表示(Kipf和Welling 2017;Du et al. 2017)。每个节点特征对单个数据及其与其他数据点的关系进行建模,从而聚合来自邻居节点的信息。

Thanks to the successes of the GNN to capture the topology-aware features (Xie, Xu, and Ji 2022; Wu et al. 2022; Yuan and Ji 2020), the GNN is actively used in various computer vision tasks. For example, the GNN is utilized to capture the local features to find image correspondence (Sarlin et al. 2020), and multi-modal feature for action segmentation in videos (Zhang, Tsai, and Tsai 2022). Especially, knowledge distillation method through GNN (Zhou et al. 2021; Lassance et al. 2020) is proposed, which is claimed better than conventional contrastive loss, by transferring an additional knowledge on the instance-wise relations.
由于GNN成功捕获了拓扑感知特征(Xie,Xu和Ji 2022;Wu et al. 2022;Yuan和Ji 2020),GNN被积极用于各种计算机视觉任务。例如,GNN用于捕获局部特征以找到图像对应关系(Sarlin et al. 2020),以及用于视频中动作分割的多模态特征(Zhang,Tsai和Tsai 2022)。特别是,通过GNN(Zhou et al. 2021;Lassance et al. 2020)提出了知识蒸馏方法,该方法通过转移实例关系上的额外知识,比传统的对比损失更好。

Recently, the graph constructed by the patch-wise relation was suggested to capture the visual features. The graph partitioning methods are employed for the unsupervised segmentation (Melas-Kyriazi et al. 2022; Wang et al. 2022), where the graph is obtained by the token-wise similarity from the vision transformer. Vision GNN (Han et al. 2022) is introduced, which have GCN-based architecture to extract the topology-aware representation, and showed its superior performance to the widely used models such as the CNN and the vision transformers.
最近,提出了由分片关系构造的图来捕获视觉特征。图分割方法用于无监督分割(Melas-Kyriazi et al. 2022; Wang et al. 2022),其中图是通过视觉Transformer的标记相似性获得的。介绍了Vision GNN(Han et al. 2022),它具有基于GCN的架构来提取拓扑感知表示,并显示出其上级性能优于广泛使用的模型,如CNN和视觉变换器。

Method 方法

Inspired by the previous works, we are interested in exploiting patch-wise relation that represents semantic topology of the image. In particular, we focus on the topology-aware features using graph formed by the semantic relation of patches, and explore how the features improve the task performance.
受以前的作品的启发,我们有兴趣利用补丁明智的关系,代表语义拓扑的图像。特别是,我们专注于拓扑感知的功能,使用由补丁的语义关系形成的图,并探讨如何提高任务性能的功能。

Specifically, our method is motivated by the consistency of the patch-wise semantic connection of the input and the output images, as shown in Fig. 2(b). If the patch features (𝑧𝑖,𝑧𝑗) have semantic connection in the input image, then the patches (𝑣𝑖,𝑣𝑗) for the corresponding location of the output should also have the connection. From the motivation, we present a method that utilizes the topology of patch-wise connection of the input image as a prior knowledge.
具体地说,我们的方法是由输入和输出图像的逐块语义连接的一致性激发的,如图2(b)所示。如果块特征( 𝑧𝑖,𝑧𝑗 )在输入图像中具有语义连接,则用于输出的对应位置的块( 𝑣𝑖,𝑣𝑗 )也应当具有该连接。从动机,我们提出了一种方法,利用拓扑结构的分块连接的输入图像作为先验知识。

More specifically, we capture the topology-aware patch features by a GNN, where the patch-wise connection is given by the shared adjacency matrix 𝐴. We then obtain the node features 𝑍={𝑧𝑖}𝑖=1𝑁 and 𝑉={𝑣𝑖}𝑖=1𝑁 and maximize node-wise MI by the contrastive loss. We also utilize the graph pooling, to maximize the MI within the task-relevant focused view of the graph. More details follows.
更具体地说,我们通过GNN捕获拓扑感知的补丁特征,其中补丁连接由共享邻接矩阵 𝐴 给出。然后,我们获得节点特征 𝑍={𝑧𝑖}𝑖=1𝑁 和 𝑉={𝑣𝑖}𝑖=1𝑁 ,并通过对比损失来最大化节点MI。我们还利用图池,以最大化图的任务相关聚焦视图内的MI。更多细节如下。

Graph Representation for Image Translation
图像平移的图表示

We first construct the graph for input image 𝑔𝑖={𝐴,𝐹𝑖}, where 𝐴 is adjacency matrix and 𝐹𝑖’s are node features that represent the image patches. Specifically, we randomly sample 𝑁 patch features 𝑓𝑛∈ℝ𝑐 from the dense feature 𝐹=𝐸⁢(𝑥)∈ℝ𝑐×ℎ×𝑤 which is obtained from the intermediate layer of model 𝐸, where 𝑐,ℎ,𝑤 denote the number of color channel, height, and width, respectively. We set the 𝑁 features as the nodes for the graph 𝑔𝑖 (i.e. 𝐹𝑖=[𝑓1,…,𝑓𝑁]).
我们首先为输入图像 𝑔𝑖={𝐴,𝐹𝑖} 构建图,其中 𝐴 是邻接矩阵, 𝐹𝑖 是表示图像块的节点特征。具体地,我们从从模型 𝐸 的中间层获得的密集特征 𝐹=𝐸⁢(𝑥)∈ℝ𝑐×ℎ×𝑤 中随机采样 𝑁 块特征 𝑓𝑛∈ℝ𝑐 ,其中 𝑐,ℎ,𝑤 分别表示颜色通道的数量、高度和宽度。我们将 𝑁 特征设置为图 𝑔𝑖 (即 𝐹𝑖=[𝑓1,…,𝑓𝑁] )的节点。

Then, we obtain the adjacency matrix 𝐴∈ℝ𝑁×𝑁 according to the cosine similarity of the patch features. We connect the patches if the similarity is above the predefined threshold 𝑡, and disconnect them in otherwise. Specifically, the connectivity 𝐴𝑖⁢𝑗 for features 𝑓𝑖,𝑓𝑗 is computed by
然后,我们根据斑块特征的余弦相似度获得邻接矩阵 𝐴∈ℝ𝑁×𝑁 。如果相似性高于预定义的阈值 𝑡 ,则连接补丁,否则断开它们。具体地,特征 𝑓𝑖,𝑓𝑗 的连接性 𝐴𝑖⁢𝑗 通过以下公式计算:

𝐴𝑖⁢𝑗≔{1 if ⁢cos⁢(𝑓𝑖,𝑓𝑗)≥𝑡0 if ⁢cos⁢(𝑓𝑖,𝑓𝑗)<𝑡(1)

We construct the output graph 𝑔𝑜={𝐴,𝐹𝑜} in similar way. We sample 𝑁 features 𝑓𝑛′∈ℝ𝑐 from the corresponding location of the dense feature 𝐹⁢’=𝐸∘𝐺⁢(𝑥)∈ℝ𝑐×ℎ×𝑤, and set as the nodes for the graph 𝑔𝑜 (i.e. 𝐹𝑜=[𝑓1′,…,𝑓𝑁′] ). To retain the topological correspondency between the patches, the output graph inherits the adjacency matrix 𝐴 from the input graph as shown in Fig. 3.
我们以类似的方式构造输出图 𝑔𝑜={𝐴,𝐹𝑜} 。我们从密集特征 𝐹⁢’=𝐸∘𝐺⁢(𝑥)∈ℝ𝑐×ℎ×𝑤 的对应位置采样 𝑁 特征 𝑓𝑛′∈ℝ𝑐 ,并设置为图 𝑔𝑜 (即 𝐹𝑜=[𝑓1′,…,𝑓𝑁′] )的节点。为了保持片之间的拓扑对应性,输出图从输入图继承邻接矩阵 𝐴 ,如图3所示。

Refer to caption

Figure 3:The construction of graphs 𝑔𝑜,𝑔𝑖 with shared adjacency matrix 𝐴. Each graph extracts 𝑙-hop features 𝑍,𝑉 from the given node 𝐹𝑖,𝐹𝑜.
图3:具有共享邻接矩阵 𝐴 的图 𝑔𝑜,𝑔𝑖 的构造。每个图从给定节点 𝐹𝑖,𝐹𝑜 提取 𝑙 -跳特征 𝑍,𝑉 。

Next, we obtain the graph representation 𝑍,𝑉 using Topology Adaptive Graph Convolution Network  (Du et al. 2017) by the graph 𝑔𝑜,𝑔𝑖 as follows:
接下来,我们通过图 𝑔𝑜,𝑔𝑖 使用拓扑自适应图卷积网络(Du et al. 2017)获得图表示 𝑍,𝑉 ,如下所示:

𝑍=∑𝑙=0𝐿(𝐴¯)𝑙⁢𝐹𝑖⁢𝑊𝑙(2)
𝑉=∑𝑙=0𝐿(𝐴¯)𝑙⁢𝐹𝑜⁢𝑊𝑙(3)

where 𝐴¯ is the normalized adjacency matrix, and 𝑊𝑙 is the shared parameter for the 𝑙-th hop. We obtain 2-hop representation from the graph (i.e. 𝐿=2).
其中 𝐴¯ 是归一化邻接矩阵, 𝑊𝑙 是第#2跳的共享参数。我们从图中获得2跳表示(即 𝐿=2 )。

Finally, to enforce the topological correspondence between input 𝑋 and output 𝐺⁢(𝑋) for a given generator 𝐺, we maximize the mutual information between the nodes 𝑍,𝑉 by the infoNCE loss  (Oord, Li, and Vinyals 2018) as follows:
最后,为了强制给定生成器 𝐺 的输入 𝑋 和输出 𝐺⁢(𝑋) 之间的拓扑对应关系,我们通过infoNCE损失最大化节点 𝑍,𝑉 之间的互信息(Oord,Li和Vinyals 2018)如下:

𝐿𝐺⁢𝑁⁢𝑁⁢(𝑋,𝐺⁢(𝑋))=−1𝑁⁢∑𝑖=1𝑁[log⁡exp⁡(𝑧𝑖⊤⁢𝑣𝑖)∑𝑗=1𝑁exp⁡(𝑧𝑖⊤⁢𝑣𝑗)](4)

where 𝑧𝑖,𝑣𝑖 are the 𝑖-th node features from 𝑍 and 𝑉 from 𝑋 and 𝐺⁢(𝑋), respectively.
其中 𝑧𝑖,𝑣𝑖 分别是来自 𝑋 和 𝐺⁢(𝑋) 的 𝑍 和 𝑉 的第 𝑖 个节点特征。

When 𝐿=0, the proposed method shrinks to the conventional patch-wise contrastive learning with the projector network 𝑊0. In this perspective, our method utilizes the higher-ordered features by the graph aggregation (i.e. 𝐿>0), which generalizes the conventional contrastive learning.
当 𝐿=0 时,所提出的方法收缩到具有投影仪网络 𝑊0 的常规逐块对比学习。从这个角度来看,我们的方法通过图聚合(即 𝐿>0 )利用高阶特征,这概括了传统的对比学习。

Graph Pooling for Focused Attention
用于集中注意力的图形池

We pool the graph nodes to utilize task-relevant focused attention of the graph. In other words, we downsample the nodes by its relevancy to the task, and construct the graph with fewer nodes to focus on the task-relevant nodes.
我们将图节点池化以利用图的任务相关集中注意力。换句话说,我们根据节点与任务的相关性对节点进行下采样,并使用更少的节点来构建图,以关注与任务相关的节点。

Specifically, following the top-𝐾 pooling  (Gao and Ji 2019), we select 𝐾 nodes from the 𝑁 nodes 𝑍=[𝑧1,…,𝑧𝑁] by the similarity score 𝑠𝑖=𝑝⊤⁢𝑧𝑖, where 𝑝 is the learnable pooling vector. Accordingly, the adjacency matrix 𝐴𝑝∈ℝ𝐾×𝐾 for the pooled graph is constructed, by excluding the connections with non-selected nodes from the original matrix 𝐴. Then, the nodes are weighted by the score followed by sigmoid funcion 𝜎 as:
具体来说,在前 𝐾 池化(Gao和Ji 2019)之后,我们通过相似性得分 𝑠𝑖=𝑝⊤⁢𝑧𝑖 从 𝑁 节点 𝑍=[𝑧1,…,𝑧𝑁] 中选择 𝐾 节点,其中 𝑝 是可学习的池化向量。因此,通过从原始矩阵 𝐴 中排除与未选择的节点的连接,构造用于池化图的邻接矩阵 𝐴𝑝∈ℝ𝐾×𝐾 。然后,节点通过分数加权,然后是sigmoid函数 𝜎 ,如下所示:

𝑍𝑝,𝑖⁢𝑛=𝜎⁢(𝑆)⁢𝑍(5)
𝑉𝑝,𝑖⁢𝑛=𝜎⁢(𝑆)⁢𝑉(6)

which becomes the input nodes for the pooled graphs. Then, the 𝐿-hop features are obtained as:
其成为池化图的输入节点。然后,第0#跳特征被获得为:

𝑍𝑝=∑𝑙=0𝐿(𝐴𝑝¯)𝑙⁢𝑍𝑝,𝑖⁢𝑛⁢𝑊𝑝,𝑙(7)
𝑉𝑝=∑𝑙=0𝐿(𝐴𝑝¯)𝑙⁢𝑉𝑝,𝑖⁢𝑛⁢𝑊𝑝,𝑙(8)

where 𝑊𝑝,𝑙 is the parameter of the pooled GNN. By constructing the pooled graphs 𝑔𝑖𝑝={𝐴𝑝,𝑍𝑝}, 𝑔𝑜𝑝={𝐴𝑝,𝑉𝑝} and obtaining the 𝑙-hop node feature, we also employ the infoNCE loss to maximize the MI between the nodes in the pooled graph as follows:
其中 𝑊𝑝,𝑙 是池化GNN的参数。通过构建池化图 𝑔𝑖𝑝={𝐴𝑝,𝑍𝑝} 、 𝑔𝑜𝑝={𝐴𝑝,𝑉𝑝} 并获得 𝑙 跳节点特征,我们还采用infoNCE损失来最大化池化图中的节点之间的MI,如下所示:

𝐿𝐺⁢𝑁⁢𝑁𝑝⁢(𝑋,𝐺⁢(𝑋))=−1𝐾⁢∑𝑖=1𝐾[log⁡exp⁡(𝑧𝑝,𝑖⊤𝑣𝑝,𝑖)∑𝑗=1𝑁exp⁡(𝑧𝑝,𝑖⊤𝑣𝑝,𝑗)](9)

Refer to caption

Figure 4:The top-𝐾 graph pooling (Gao and Ji 2019). The pooling vector 𝑝 provides the focused view of the graph for the given task. The final node feature is also weighted by 𝑝.
图4:顶部-0#图池(Gao和Ji 2019)。池化向量 𝑝 提供了给定任务的图形的聚焦视图。最终节点特征也由 𝑝 加权。

Here, it is remarkable how the graph pooling contributes to the improvement. As shown in Fig. 4, the vector 𝑝 learns to focus on the important nodes, which is determined by the task-relevancy of the nodes. It is analogous to the conventional attention methods  (Woo et al. 2018; Park et al. 2018) shown in Fig. 5. Therefore, the graph pooling can be viewed as the node-wise attention, which imposes more regularization for the important nodes to enhance the correspondence for the image translation task.
在这里,值得注意的是图池如何有助于改进。如图4所示,向量 𝑝 学习关注重要节点,这由节点的任务相关性确定。它类似于图5所示的传统注意力方法(Woo et al. 2018; Park et al. 2018)。因此,图池化可以被视为节点级注意力,其对重要节点施加更多的正则化以增强图像翻译任务的对应性。

Refer to caption

Figure 5:Top-𝐾 graph pooling allocates higher weights to the informative nodes, similarly to the attention mechanism. (a) Top-𝐾 graph pooling. (b) Attention method.
图5:Top- 𝐾 图池为信息节点分配更高的权重,类似于注意力机制。(a)Top- 𝐾 图池。(b)注意方法。

Overall Loss Function 重试  错误原因

Our method is one-sided image translation model without cycle-consistency, inspired by the related works based on the patch-wise contrastive learning  (Jung, Kwon, and Ye 2022; Park et al. 2020; Wang et al. 2021; Zheng, Cham, and Cai 2021). Specifically, the overall loss is given as follows: 重试  错误原因

𝐿𝑡⁢𝑜⁢𝑡⁢𝑎⁢𝑙=𝐿𝐺⁢𝐴⁢𝑁⁢(𝐺,𝐷)+𝜆𝑔⁢∑𝑝=0𝑃𝐿𝐺⁢𝑁⁢𝑁𝑝⁢(𝑋,𝐺⁢(𝑋))(10)
+𝜆𝑔⁢∑𝑝=0𝑃𝐿𝐺⁢𝑁⁢𝑁𝑝⁢(𝑌,𝐺⁢(𝑌))

with generator 𝐺 and discriminator 𝐷 shown in Fig. 2(a). 𝐿𝐺⁢𝐴⁢𝑁 is LSGAN loss (Mao et al. 2017) given as: 重试  错误原因

𝐿𝐺⁢𝐴⁢𝑁=𝐸𝑦∼𝑝𝑌⁢[‖𝐷⁢(𝑦)‖22]+𝐸𝑥∼𝑝𝑋⁢[‖1−𝐷⁢(𝐺⁢(𝑥))‖22](11)

with the distributions 𝑝𝑋,𝑝𝑌 for source and target domain. Additionally, we utilize the identity term 𝐿𝐺⁢𝑁⁢𝑁𝑝⁢(𝑌,𝐺⁢(𝑌)) to stabilize the training using the target domain images 𝑌, as suggested in  (Park et al. 2020). 𝐿𝐺⁢𝑁⁢𝑁𝑝=0 refers the graph loss without the pooling. 重试  错误原因

Refer to caption

Figure 6:Qualitative comparison with related methods. Our result shows enhanced input-output correspondence, compared to the previous methods.
图6:与相关方法的定性比较。我们的结果表明,增强的输入输出对应,与以前的方法相比。

Experimental Results 实验结果

Implementation Details 实现细节

We first verify our method for unpaired image translation task. We verify our method using the five datasets as follows: horse→zebra, Label→Cityscape, map→satellite, summer→winter, and apple→orange. All images are resized into 256×256 for training and testing. Then, we also present our method for single image translation with high resolution, following the previous work (Park et al. 2020).
我们首先验证我们的方法不成对的图像翻译任务。我们使用以下五个数据集验证我们的方法:马 → 斑马,标签 → 城市景观,地图 → 卫星,夏天 → 冬天,和苹果 → 橙子。所有图像都调整为256 × 256,用于训练和测试。然后,我们还提出了我们的方法,用于高分辨率的单个图像翻译,遵循以前的工作(Park等人。2020)。

For the graph construction, we randomly sampled 256 different patches from the pre-trained VGG16  (Simonyan and Zisserman 2014) network in both of input and output images. We extract the dense feature from the three different layers (relu3-1, relu4-1, relu4-3layer) inside of the network. For the graph operation, we set the number of GNN hops as 2, and pooling number as 1. For the graph pooling, we downsampled nodes by 1/4. In other words, we have 256 nodes in the initial graph, and 64 nodes for the pooled graph. More details are provided in the supplementary materials.
为了构建图形,我们从输入和输出图像中的预训练VGG16(Simonyan和Zisserman 2014)网络中随机抽取了256个不同的补丁。我们从网络内部的三个不同层(relu3—1,relu4—1,relu4—3层)中提取稠密特征。对于图操作,我们将GNN跳数设置为2,池数设置为1。对于图池,我们将节点的采样减少了1/4。换句话说,我们在初始图中有256个节点,在池图中有64个节点。补充材料中提供了更多细节。

Image-to-Image Translation

We compare our method with the two-sided domain translation models, CycleGAN  (Zhu et al. 2017) and MUNIT  (Huang et al. 2018). Also, we selected the one-sided image translation models, DistanceGAN  (Benaim and Wolf 2017) and GcGAN (Fu et al. 2019). Especially, since our method is based on the patch-wise contrastive learning, we present the comparison with the recent contrastive learning based methods. We compare our method with CUT (Park et al. 2020) as baseline model, and the improved model of NEGCUT (Wang et al. 2021), SeSim (Zheng, Cham, and Cai 2021) and Hneg-SRC  (Jung, Kwon, and Ye 2022).
我们将我们的方法与双边域翻译模型CycleGAN(Zhu et al. 2017)和MUNIT(Huang et al. 2018)进行了比较。此外,我们选择了单侧图像转换模型DistanceGAN(Benaim和Wolf 2017)和GcGAN(Fu等人2019)。特别是,由于我们的方法是基于块明智的对比学习,我们提出了与最近的对比学习为基础的方法的比较。我们将我们的方法与CUT(Park et al. 2020)作为基线模型,以及NEGCUT(Wang et al. 2021),SeSim(Zheng,Cham和Cai 2021)和Hong-SRC(Jung,Kwon和Ye 2022)的改进模型进行了比较。

Results 结果

Refer to caption

Figure 7:Closer views of the output images. Our method enhances the spatial-specific information given in the input.
图7:输出图像的更近视图。我们的方法增强了输入中给定的空间特定信息。

The results in Fig. 6 verifies that the proposed method generates the images with better visual quality than the other methods, by enhancing the correspondence between the input and the output images. Compared to the other methods, our methods preserves the structural information of the input images, by using the patch-wise connection of the input as the prior knowledge.
图6中的结果验证了所提出的方法通过增强输入和输出图像之间的对应性来生成具有比其他方法更好的视觉质量的图像。与其他方法相比,我们的方法保留了输入图像的结构信息,通过使用块式连接的输入作为先验知识。

Moreover, we further compare our method with the HnegSRC which also utilizes the patch-wise semantic relation of the input. As shown in Fig. 7, our method enhances the spatial-specific information considering the patch-wise semantic neighborhood by the graph operation, compared with the HnegSRC which only imposed the consistency regularization for the patch-wise similarity. Specifically, our method in Fig. 7(a) outputs more realistic zebra by showing the spatial-specific patterns(e.g. dark colored mouth), which is not in the compared result. Also, our result in Fig. 7(b) shows the tree branches with the coherent shapes to the input, which are distorted in the compared method.
此外,我们进一步比较我们的方法与HSPINESS SRC,它也利用了补丁明智的语义关系的输入。如图7所示,与仅对分块相似性施加一致性正则化的HSPINESS SRC相比,我们的方法通过图操作考虑分块语义邻域来增强空间特定信息。具体地,我们在图7(a)中的方法通过示出空间特定的图案(例如,深色嘴)来输出更真实的斑马,这不在比较结果中。此外,我们在图7(B)中的结果显示了具有与输入相关的形状的树枝,这些树枝在比较方法中被扭曲。

The results in Table 1 also supports the outperformance of the proposed method. Specifically, in horse→zebra and Label→Cityscape datasets, we similar FID scores with the HnegSRC, but higher scores by KID. For summer→winter and apple→orange datasets, our model outperformed the others by large margins, which demonstrates the effectiveness of the proposed model.
表1中的结果也支持所提出的方法的优越性。具体来说,在马 → 斑马和标签 → 城市景观数据集,我们类似的FID分数与HALCOST SRC,但更高的分数KID。对于夏季 → 冬季和苹果 → 橙子数据集,我们的模型的性能大大优于其他人,这表明了所提出的模型的有效性。

MethodHorse→Zebra 马 → 斑马Label→Cityscape 标签 → CityscapeMap→Satellite 地图 → 卫星Summer→Winter 夏季 → 冬季Apple→Orange 苹果 → 橙子
FID↓ FID编号0#KID↓ 儿童编号0#FID↓ FID编号0#KID↓ 儿童编号0#FID↓ FID编号0#KID↓ 儿童编号0#FID↓ FID编号0#KID↓ 儿童编号0#FID↓ FID编号0#KID↓ 儿童编号0#
CycleGAN77.21.95776.33.53254.63.43084.91.022174.610.051
MUNIT133.83.79091.46.401181.712.03115.44.901207.012.853
DistanceGAN72.01.85681.84.41098.15.78997.22.843181.911.362
GCGAN86.72.051105.26.82479.45.15397.52.755178.410.828
\hdashlineCUT45.50.54156.41.61156.13.30184.31.207171.59.642
NEGCUT39.60.47748.51.43251.02.33882.71.352154.17.876
LSeSIM38.00.42249.72.86752.43.20583.91.230168.610.386
HnegSRC34.40.43846.40.66249.22.53181.81.181158.38.434
\hdashlineOurs \hdashlineOurs文件34.50.27146.80.60545.92.11275.80.845139.17.134

Table 1:Quantitative results. Our model outperforms the baselines in both of FID and KID×100 metrics.
表1:定量结果。我们的模型在FID和KID × 100指标上都优于基线。

Refer to caption

Figure 8:Qualitative comparison on single image translation.
图8:单个图像翻译的定性比较。

Refer to caption

Figure 9:Analysis of the proposed method: (a) Input and the output images. (b) Visualization of 𝜎⁢(𝑆𝑖⁢𝑛),𝜎⁢(𝑆𝑜⁢𝑢⁢𝑡). The vector 𝑝 allocates higher weights for the object parts which are task-relevant. Similar appearance refers the correspondence between input and output. (c) Eigenvectors of the Laplacian matrix of 𝐴, which are coherent to the semantics of the image. 重试  错误原因

Refer to caption

Figure 10:The adjacency matrix 𝐴 is constructed from 𝐹𝑖 which is the output of learnable ℎ. Here, ℎ is updated by the gradient from the 𝐹𝑜 similar to CUT (Park et al. 2020).
图10:邻接矩阵 𝐴 是从 𝐹𝑖 构造的, 𝐹𝑖 是可学习的 ℎ 的输出。这里,与CUT类似, ℎ 由 𝐹𝑜 的梯度更新(Park等人,2020)。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/629408.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Windows快捷命令

Windows 操作系统提供了大量的快捷命令&#xff0c;用于快速访问系统设置和管理工具。这些命令在各个版本的 Windows 中基本都适用&#xff0c;可以帮助用户快速进入各类管理工具&#xff0c;方便系统的配置和管理。如果你需要使用这些工具&#xff0c;只需按 Win R 键&#x…

win11快速安装mysql数据库系统

win11快速安装mysql数据库系统 1、下载 1.1 打开官网 1.2 向下滚动页面 1.3 进入下载选项 1.4 下载8.0.4 LTS 1.5 开始下载 1.6 下载中 2、解压 大家注意&#xff0c;此时解压后目录是没有data目录的。 3、数据库初始化 3.1 管理员身份打开CMD 开始菜单上&#xff0c;输入…

【漏洞复现】Secnet-智能路由系统弱口令

0x01 产品简介 Secnet安网智能AC管理系统是广州安网通信技术有限公司(简称“安网通信”)的无线AP管理系统 0x02 漏洞描述 攻击者可直接利用弱口令登录系统 0x03 搜索语法 fofa: title"安网-智能路由系统" || title"智能路由系统" || title"安网科…

代码随想录算法训练营第三十一天|455.分发饼干,376. 摆动序列,53. 最大子序和

455.分发饼干 优先把小饼干分给胃口值小的&#xff0c;或者是把大饼干分给胃口大的。 376. 摆动序列 class Solution { public:int wiggleMaxLength(vector<int>& nums) {if (nums.size() < 1) return nums.size();int curDiff 0; // 当前一对差值int preDiff …

PostgreSQL扩展之PGroonga:多语言全文搜索

简介 PGroonga 是一个 PostgreSQL 扩展&#xff0c;它增加了基于 Groonga 的全文搜索索引方法。虽然原生的 PostgreSQL 支持全文索引&#xff0c;但它仅限于基于字母和数字的语言。PGroonga 提供了更广泛的字符支持&#xff0c;使其成为 PostgreSQL 支持的语言的超集&#xff…

Ubuntu20.04调试功能包的一些报错解决办法【更新中2024.05.14】

一、Could not find a package configuration file provided by “catkin_virtualenv” 解决办法&#xff1a; sudo apt install ros-noetic-catkin-virtualenv二、 ERROR: Could not find a version that satisfies the requirement pip-tools5.1.2 (from versions: none) …

将PDF转换成电子杂志,轻松打造畅销内容!

在数字化时代&#xff0c;将PDF转换成电子杂志是一种非常受欢迎的内容创作方式。这种方式不仅可以提高内容的传播效果&#xff0c;还可以为创作者带来更多的收益。那么&#xff0c;如何轻松地将PDF转换成电子杂志&#xff0c;打造畅销内容呢&#xff1f; 市面上有许多可以将PDF…

战网国际服加速器用哪个好 暴雪战网好用加速器介绍

战网国际版&#xff0c;又称Battle.net环球版&#xff0c;是暴雪娱乐操盘的全球性游戏互动平台&#xff0c;它跨越地理界限&#xff0c;服务于全球游戏爱好者。与地区限定版本相异&#xff0c;国际版赋予玩家自由进入暴雪旗下众多经典游戏的权利&#xff0c;无论身处何方&#…

【Linux 网络】网络基础(二)(应用层协议:HTTP、HTTPS)-- 详解

我们程序员写的一个个解决我们实际问题&#xff0c;满足我们日常需求的网络程序&#xff0c;都是在应用层。 前面写的套接字接口都是传输层经过对 UDP 和 TCP 数据发送能力的包装&#xff0c;以文件的形式呈现给我们&#xff0c;让我们可以进行应用层编程。换而言之&#xff0c…

Go微服务: 接入Prometheus性能监控平台与Grafana平台

接入Prometheus 在 go-micro 生成的模板中, 我们一如既往的完成基础工作之后 进入main.go工作的代码编写&#xff0c;main.go package mainimport ("fmt""log""strconv""github.com/go-micro/plugins/v4/registry/consul"opentracing…

【nfs服务部署服务端和客户端搭建】

原理 NFS&#xff08;Network File System&#xff09;是文件服务器之一。它的功能是可以通过网络&#xff0c;让不同的机器、不同的操作系统可以彼此共享数据文件。 NFS服务器可以让服务端的共享目录挂载到本地端的文件系统中&#xff0c;其他服务器如果想访问共享目录&#…

OFDM 802.11a的FPGA实现(十六)长训练序列:LTS(含Matlab和verilog代码)

目录 1.前言2.原理3.Matlab生成长训练序列4.硬件实现5.ModelSim仿真6.和Matlab仿真结果对比 原文链接&#xff08;相关文章合集&#xff09;&#xff1a; OFDM 802.11a的xilinx FPGA实现 1.前言 在之前已经完成了data域数据的处理&#xff0c;在构建整个802.11a OFDM数据帧的时…

基于死区补偿的永磁同步电动机矢量控制系统simulink仿真模型

整理了基于死区补偿的永磁同步电动机矢量控制系统simulink仿真&#xff0c;该模型使用线性死区补偿的PMSM矢量控制算法进行仿真&#xff0c;使用Foc电流双闭环 。 1.模块划分清晰&#xff0c;补偿前后仿真有对比&#xff0c;易于学习; 2.死区补偿算法的线性区区域可调; 3.自…

fyne更新GUI内容

fyne更新GUI内容 实现一个时钟界面&#xff0c;每秒钟更新一次。 package mainimport ("fyne.io/fyne/v2/app""fyne.io/fyne/v2/widget""time" )func updateTime(label *widget.Label) {formatted : time.Now().Format("2006-01-02 15:04…

Softing工业推出的edgeConnector将Allen-Bradley控制器集成到工业边缘应用中

2024年4月17日&#xff08;哈尔&#xff09;&#xff0c;Softing宣布扩展其基于Docker的edgeConnector产品系列&#xff0c;推出了新软件模块edgeConnector Allen Bradley PLC&#xff0c;可方便用户访问来自ControlLogix和CompactLogix控制器数据。 &#xff08;edgeConnector…

LSTM与GAN创新结合!模型性能起飞,准确率超98%

今天来聊一个深度学习领域非常具有创新性的研究方向&#xff1a;LSTM结合GAN。 LSTM擅长处理和记忆长期的时间依赖关系&#xff0c;而GAN可以学习复杂的数据分布并生成逼真的数据样本。通过充分结合两者的优势&#xff0c;我们可以增强模型对复杂数据的处理能力&#xff0c;提…

二叉树的常见操作

建立树 复制二叉树 计算深度 计算总结点数 计算叶子结点数

GPT-4o、GPT-4国内可用!新UI界面率先体验方法!

测试情况&#xff1a; 现根据测试结果&#xff0c;先对比一下普号4o和付费的区别&#xff1a; 注&#xff1a; plus限制情况&#xff1a;4的次数用完后可以用4o&#xff0c;但4o的80条用完后不能用4&#xff1b; team账户限制是100条/3h&#xff0c;4o和4共享额度 目前发现的…

2024最新洗地机推荐,洗地机怎么选?热门品牌哪个最好用?

在现代生活中&#xff0c;忙碌的日常让家庭清洁变得更加繁重和耗时。然而&#xff0c;洗地机的引入彻底改变了这一状况。凭借其强大的清洁效果和简便的使用方式&#xff0c;洗地机能够迅速清除地面上的各种污垢&#xff0c;使清洁工作变得轻松自如。正因为如此&#xff0c;洗地…