ArtNeRF、Attention Control、Pixel is a Barrier、FilterPrompt

本文首发于公众号:机器感知

ArtNeRF、Attention Control、Pixel is a Barrier、FilterPrompt

图片

ArtNeRF: A Stylized Neural Field for 3D-Aware Cartoonized Face Synthesis

图片

Recent advances in generative visual models and neural radiance fields have greatly boosted 3D-aware image synthesis and stylization tasks. However, previous NeRF-based work is limited to single scene stylization, training a model to generate 3D-aware cartoon faces with arbitrary styles remains unsolved. We propose ArtNeRF, a novel face stylization framework derived from 3D-aware GAN to tackle this problem. In this framework, we utilize an expressive generator to synthesize stylized faces and a triple-branch discriminator module to improve the visual quality and style consistency of the generated faces. Specifically, a style encoder based on contrastive learning is leveraged to extract robust low-dimensional embeddings of style images, empowering the generator with the knowledge of various styles. To smooth the training process of cross-domain transfer learning, we propose an adaptive style blending module which helps inject style information and allows users to freely tune t......

Rethink Arbitrary Style Transfer with Transformer and Contrastive  Learning

图片

Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality of stylized images. Firstly, we propose Style Consistency Instance Normalization (SCIN), a method to refine the alignment between content and style features. In addition, we have developed an Instance-based Contrastive Learning (ICL) approach designed to understand the relationships among various styles, thereby enhancing the quality of the resulting stylized images. Recognizing that VGG networks are more adept at extracting classification features and need to be better suited for capturing style features, we have also introduced the Perception Encoder (PE) to capture style fe......

Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text  Consistency and Domain Distribution Gap

图片

The recent advancements in Text-to-Video Artificial Intelligence Generated Content (AIGC) have been remarkable. Compared with traditional videos, the assessment of AIGC videos encounters various challenges: visual inconsistency that defy common sense, discrepancies between content and the textual prompt, and distribution gap between various generative models, etc. Target at these challenges, in this work, we categorize the assessment of AIGC video quality into three dimensions: visual harmony, video-text consistency, and domain distribution gap. For each dimension, we design specific modules to provide a comprehensive quality assessment of AIGC videos. Furthermore, our research identifies significant variations in visual quality, fluidity, and style among videos generated by different text-to-video models. Predicting the source generative model can make the AIGC video features more discriminative, which enhances the quality assessment performance. The proposed method was used......

LASER: Tuning-Free LLM-Driven Attention Control for Efficient  Text-conditioned Image-to-Animation

图片

Revolutionary advancements in text-to-image models have unlocked new dimensions for sophisticated content creation, e.g., text-conditioned image editing, allowing us to edit the diverse images that convey highly complex visual concepts according to the textual guidance. Despite being promising, existing methods focus on texture- or non-rigid-based visual manipulation, which struggles to produce the fine-grained animation of smooth text-conditioned image morphing without fine-tuning, i.e., due to their highly unstructured latent space. In this paper, we introduce a tuning-free LLM-driven attention control framework, encapsulated by the progressive process of LLM planning, prompt-Aware editing, StablE animation geneRation, abbreviated as LASER. LASER employs a large language model (LLM) to refine coarse descriptions into detailed prompts, guiding pre-trained text-to-image models for subsequent image generation. We manipulate the model's spatial features and self-attention mecha......

Generalizable Novel-View Synthesis using a Stereo Camera

图片

In this paper, we propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images. Since recent stereo matching has demonstrated accurate geometry prediction, we introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction. To this end, this paper proposes a novel framework, dubbed StereoNeRF, which integrates stereo matching into a NeRF-based generalizable view synthesis approach. StereoNeRF is equipped with three key components to effectively exploit stereo matching in novel-view synthesis: a stereo feature extractor, a depth-guided plane-sweeping, and a stereo depth loss. Moreover, we propose the StereoNVS dataset, the first multi-view dataset of stereo-camera images, encompassing a wide variety of both real and synthetic scenes. Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis. ......

Motion-aware Latent Diffusion Models for Video Frame Interpolation

图片

With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecutive frames, and this imprecise estimation leads to blurred and visually incoherent interpolated frames. In this paper, we propose a novel diffusion framework, motion-aware latent diffusion models (MADiff), which is specifically designed for the VFI task. By incorporating motion priors between the conditional neighboring frames with the target interpolated frame predicted throughout the diffusion sampling procedure, MADiff progressively refines the intermediate outcomes, culminating in generating both visually smooth and realistic results. Extensive experiments conducted on benchmark ......

Music Consistency Models

图片

Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music Consistency Models (\texttt{MusicCM}), which leverages the concept of consistency models to efficiently synthesize mel-spectrogram for music clips, maintaining high quality while minimizing the number of sampling steps. Building upon existing text-to-music diffusion models, the \texttt{MusicCM} model incorporates consistency distillation and adversarial discriminator training. Moreover, we find it beneficial to generate extended coherent music by incorporating multiple diffusion processes with shared constraints. Experimental results reveal the effectiveness of our model in terms of......

EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively  Constrained Global Bundle Adjustment

图片

We introduce EC-SLAM, a real-time dense RGB-D simultaneous localization and mapping (SLAM) system utilizing Neural Radiance Fields (NeRF). Although recent NeRF-based SLAM systems have demonstrated encouraging outcomes, they have yet to completely leverage NeRF's capability to constrain pose optimization. By employing an effectively constrained global bundle adjustment (BA) strategy, our system makes use of NeRF's implicit loop closure correction capability. This improves the tracking accuracy by reinforcing the constraints on the keyframes that are most pertinent to the optimized current frame. In addition, by implementing a feature-based and uniform sampling strategy that minimizes the number of ineffective constraint points for pose optimization, we mitigate the effects of random sampling in NeRF. EC-SLAM utilizes sparse parametric encodings and the truncated signed distance field (TSDF) to represent the map in order to facilitate efficient fusion, resulting in reduced mode......

Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than  We Think

图片

Adversarial examples for diffusion models are widely used as solutions for safety concerns. By adding adversarial perturbations to personal images, attackers can not edit or imitate them easily. However, it is essential to note that all these protections target the latent diffusion model (LDMs), the adversarial examples for diffusion models in the pixel space (PDMs) are largely overlooked. This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models. In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs. This finding is supported by extensive experiments of almost a wide range of attacking methods on various PDMs and LDMs with different model structures, which means diffusion models are indeed much more robust against adversarial attacks. We also find that PDMs can be used as an off-the-shelf purifier to effectively remove the adver......

FilterPrompt: Guiding Image Transfer in Diffusion Models

图片

In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data, aiming to get representations accurately. Previous research has predominantly concentrated on disentangling image attributes within feature space. However, the complex distribution present in real-world data often makes the application of such decoupling algorithms to other datasets challenging. Moreover, the granularity of control over feature encoding frequently fails to meet specific task requirements. Upon scrutinizing the characteristics of various generative models, we have observed that the input sensitivity and dynamic evolution properties of the diffusion model can be effectively fused with the explicit decomposition operation in pixel space. This integration enables the ima......

BACS: Background Aware Continual Semantic Segmentation

图片

Semantic segmentation plays a crucial role in enabling comprehensive scene understanding for robotic systems. However, generating annotations is challenging, requiring labels for every pixel in an image. In scenarios like autonomous driving, there's a need to progressively incorporate new classes as the operating environment of the deployed agent becomes more complex. For enhanced annotation efficiency, ideally, only pixels belonging to new classes would be annotated. This approach is known as Continual Semantic Segmentation (CSS). Besides the common problem of classical catastrophic forgetting in the continual learning setting, CSS suffers from the inherent ambiguity of the background, a phenomenon we refer to as the "background shift'', since pixels labeled as background could correspond to future classes (forward background shift) or previous classes (backward background shift). As a result, continual learning approaches tend to fail. This paper proposes a Backward Backgro......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/573901.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

【笔试训练】day11

1.游游的水果大礼包 思路: 枚举。假设最后的答案是x个a礼包,y个b礼包,得到一个式子:ansa*xb*y 我们可以枚举x的数量,这样就能变相的把y的求出来。呃这就是鸡兔同笼问题嘛 x最大的范围是多少呢?也就是a礼…

【CouchDB 与 PouchDB】

CouchDB是什么 CouchDB,全名为Apache CouchDB,是一个开源的NoSQL数据库,由Apache软件基金会管理。CouchDB的主要特点是使用JSON作为存储格式,使用JavaScript作为查询语言(通过MapReduce函数),并…

面试二十二、跳表SkipLists

跳表全称为跳跃列表,它允许快速查询,插入和删除一个有序连续元素的数据链表。跳跃列表的平均查找和插入时间复杂度都是O(logn)。快速查询是通过维护一个多层次的链表,且每一层链表中的元素是前一层链表元素的子集(见右边的示意图&…

day07 51单片机-18B20温度检测

18B20温度检测 1.1 需求描述 本案例讲解如何从18B20传感器获取温度信息并显示在LCD上。 1.2 硬件设计 1.2.1 硬件原理图 1.2.3 18B20工作原理 可以看到18B20有两根引脚负责供电,一根引脚负责数据交换。18B20就是通过数据线和单片机进行数据交换的。 1&#xf…

前端项目中使用插件prettier/jscodeshift/json-stringify-pretty-compact格式化代码或json数据

同学们可以私信我加入学习群! 正文开始 前言一、json代码格式化-选型二、json-stringify-pretty-compact简单试用三、prettier在前端使用四、查看prettier支持的语言和插件五、使用prettier格式化vue代码最终效果如图: ![在这里插入图片描述](https://im…

【ruoyi-vue】登录解析(后端)

调试登录接口 进入实现类可以有 验证码校验 登录前置校验 用户验证 验证码校验 通过uuid获取redis 中存储的验证码信息,获取后对用户填写的验证码数据进行校验比对 用户验证 1.进入控制器的 /login 方法 2.进入security账号鉴权功能,经过jar内的流…

python逆向基础流程(纯小白教程)

一,例题链接 NSSCTF | 在线CTF平台 二,文件特征 使用工具查看文件信息,发现是pyinsatller打包的exe文件,如果硬用ida分析成汇编或c语言根本摸清楚程序的逻辑,所以思路是反编译成py文件直接分析python代码 三&#xf…

【论文推导】基于有功阻尼的转速环PI参数整定分析

前言 在学习电机控制的路上,PMSM的PI电流控制是不可避免的算法之一,其核心在于内环电流环、外环转速环的设置,来保证转速可调且稳定,并且保证较好的动态性能。整个算法仿真在《现代永磁同步电机控制原理及matlab仿真》中已详细给出…

VUE项目使用.env配置多种环境以及如何加载环境

第一步,创建多个环境配置文件 Vue CLI 项目默认使用 .env 文件来定义环境变量。你可以通过创建不同的 .env 文件来为不同环境设置不同的环境变量,例如: .env —— 所有模式共用.env.local —— 所有模式共用,但不会被 git 提交&…

Clickhouse离线安装教程

https://blog.51cto.com/u_15060531/4174350 1. 前置 1.1 检查服务器架构 服务器:Centos7.X 需要确保是否x86_64处理器构架、Linux并且支持SSE 4.2指令集 grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 …

不墨迹,向媒体投稿不讲攻略,直接上方法

作为一名单位信息宣传员,我曾深陷于向媒体投稿的泥沼之中,饱尝了费时费力、审核严苛、出稿缓慢的苦涩,承受着领导急切期盼与自我压力交织的煎熬。然而,当我有幸接触到智慧软文发布系统,这一切困境如同阴霾散去,取而代之的是便捷流畅的投稿流程,以及领导满意、团队轻松的工作氛围…

Java Swing游戏开发学习24

内容来自RyiSnow视频讲解 这一节讲的是Scrolling Message, Leveling Up, Damage Calculation滚动消息,升级,伤害计算。 伤害计算 玩家与怪的战斗,玩家对怪的伤害值为攻击值减去怪的防御值。 int damage attack - gp.monster[i].defense; …

队列的实现(c语言实现)

队列的定义 队列(Queue)是一种特殊的线性数据结构,它遵循先进先出(FIFO,First In First Out)的原则。这意味着最早被添加到队列中的元素将是最先被移除的元素。队列的主要操作包括入队(enqueue…

接口自动化测试框架建设的经验与教训

为什么选择这个话题? 一是发现很多“点工”在转型迷茫期都会问一些自动化测试相关的问题,可以说自动化测试是“点工”升级的必经之路;二是Google一下接口自动化测试,你会发现很多自动化测试框架相关的文章,但是大部分…

Nodejs--异步编程

异步编程 函数式编程 高阶函数 在通常的语言中,函数的参数只接受基本的数据类型或者是对象引用,返回值只能是基本数据类型和对象引用。 function foo(x) {return x }高阶函数是把函数作为参数,将函数作为返回值的函数 function foo(x) {…

Oceanbase体验之(二)Oceanbase集群的搭建(社区版4.2.2)

资源规划 3台observer CPU:4C及以上 内存:32G及以上 硬盘操作系统500G 存储盘1T及以上 虚拟机可以直接划分,物理机需要提前规划好资源 一、上传oceanbase安装包 登录ocp选择软件包管理 上传Oceanbase软件包(软件包获取路径 官网免费下载社…

JavaWeb--04YApi,Vue-cli脚手架Node.js环境搭建,创建第一个Vue项目

04 1 Yapi2 Vue-cli脚手架Node.js环境搭建配置npm的全局安装路径 3 创建项目(这个看下一篇文章吧) 1 Yapi 前后端分离中的重要枢纽"接口文档",以下一款为Yapi的接口文档 介绍:YApi 是高效、易用、功能强大的 api 管理平台&#…

Hive主要介绍

Hive介绍 hive是基于 Hadoop平台操作 HDFS 文件的插件工具 可以将结构化的数据文件映射为一张数据库表 可以将 HQL 语句转换为 MapReduce 程序 1.hive 是由驱动器组成,驱动器主要由4个组件组成(解析器、编译器、优化器、执行器) 2.hive本身不…

递归排列枚举(c++)

全部排列问题 输入 n 输出 1…n 个数的全部排列。全部排列中,数字可以重复 。 例如 输入 3 输出全部排列的结果如下:111、112、113、121、122、123、131、132、133、211、212、213、221、222、223、231、232、233、311、312、313、321、322、323、33…

4.18.2 EfficientViT:具有级联组注意力的内存高效Vision Transformer

现有Transformer模型的速度通常受到内存低效操作的限制,尤其是MHSA(多头自注意力)中的张量整形和逐元素函数。 设计了一种具有三明治布局的新构建块,即在高效FFN(前馈)层之间使用单个内存绑定的MHSA&#x…