GenVideo、SkelFormer、EfficientGS、HOLD、Motion Synthesis、Learn2Talk

本文首发于公众号:机器感知

GenVideo、SkelFormer、EfficientGS、HOLD、Motion Synthesis、Learn2Talk

图片

Enabling Stateful Behaviors for Diffusion-based Policy Learning

图片

While imitation learning provides a simple and effective framework for policy learning, acquiring consistent actions during robot execution remains a challenging task. Existing approaches primarily focus on either modifying the action representation at data curation stage or altering the model itself, both of which do not fully address the scalability of consistent action generation. To overcome this limitation, we introduce the Diff-Control policy, which utilizes a diffusion-based model to learn the action representation from a state-space modeling viewpoint. We demonstrate that we can reduce diffusion-based policies' uncertainty by making it stateful through a Bayesian formulation facilitated by ControlNet, leading to improved robustness and success rates. Our experimental results demonstrate the significance of incorporating action statefulness in policy learning, where Diff-Control shows improved performance across various tasks. Specifically, Diff-Control achieves an ave......

GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I  Diffusion Models

图片

Video editing methods based on diffusion models that rely solely on a text prompt for the edit are hindered by the limited expressive power of text prompts. Thus, incorporating a reference target image as a visual guide becomes desirable for precise control over edit. Also, most existing methods struggle to accurately edit a video when the shape and size of the object in the target image differ from the source object. To address these challenges, we propose "GenVideo" for editing videos leveraging target-image aware T2I models. Our approach handles edits with target objects of varying shapes and sizes while maintaining the temporal consistency of the edit using our novel target and shape aware InvEdit masks. Further, we propose a novel target-image aware latent noise correction strategy during inference to improve the temporal consistency of the edits. Experimental analyses indicate that GenVideo can effectively handle edits with objects of varying shapes, where existing appr......

Does Gaussian Splatting need SFM Initialization?

图片

3D Gaussian Splatting has recently been embraced as a versatile and effective method for scene reconstruction and novel view synthesis, owing to its high-quality results and compatibility with hardware rasterization. Despite its advantages, Gaussian Splatting's reliance on high-quality point cloud initialization by Structure-from-Motion (SFM) algorithms is a significant limitation to be overcome. To this end, we investigate various initialization strategies for Gaussian Splatting and delve into how volumetric reconstructions from Neural Radiance Fields (NeRF) can be utilized to bypass the dependency on SFM data. Our findings demonstrate that random initialization can perform much better if carefully designed and that by employing a combination of improved initialization strategies and structure distillation from low-cost NeRF models, it is possible to achieve equivalent results, or at times even superior, to those obtained from SFM initialization. ......

SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal  Transformers

图片

We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations. This module integrates prior knowledge about pose space and infers the full pose state at runtime. Separating the 3D keypoint detection and inverse-kinematic problems, along with the expressive representations learned by our skeletal transformer, enhance the generalization of our method to unseen noisy data. We evaluate our method on three public datasets in both in-distribution and out-of-distribution settings using three datasets, and observe strong performance with respect to prior works. Moreover, ablation experiments demonstrate the impact of each of the mo......

Improving Chinese Character Representation with Formation Tree

图片

Learning effective representations for Chinese characters presents unique challenges, primarily due to the vast number of characters and their continuous growth, which requires models to handle an expanding category space. Additionally, the inherent sparsity of character usage complicates the generalization of learned representations. Prior research has explored radical-based sequences to overcome these issues, achieving progress in recognizing unseen characters. However, these approaches fail to fully exploit the inherent tree structure of such sequences. To address these limitations and leverage established data properties, we propose Formation Tree-CLIP (FT-CLIP). This model utilizes formation trees to represent characters and incorporates a dedicated tree encoder, significantly improving performance in both seen and unseen character recognition tasks. We further introduce masking for to both character images and tree nodes, enabling efficient and effective training. This ......

EfficientGS: Streamlining Gaussian Splatting for Large-Scale  High-Resolution Scene Representation

图片

In the domain of 3D scene representation, 3D Gaussian Splatting (3DGS) has emerged as a pivotal technology. However, its application to large-scale, high-resolution scenes (exceeding 4k$\times$4k pixels) is hindered by the excessive computational requirements for managing a large number of Gaussians. Addressing this, we introduce 'EfficientGS', an advanced approach that optimizes 3DGS for high-resolution, large-scale scenes. We analyze the densification process in 3DGS and identify areas of Gaussian over-proliferation. We propose a selective strategy, limiting Gaussian increase to key primitives, thereby enhancing the representational efficiency. Additionally, we develop a pruning mechanism to remove redundant Gaussians, those that are merely auxiliary to adjacent ones. For further enhancement, we integrate a sparse order increment for Spherical Harmonics (SH), designed to alleviate storage constraints and reduce training overhead. Our empirical evaluations, conducted on a ra......

Generative Modelling with High-Order Langevin Dynamics

图片

Diffusion generative modelling (DGM) based on stochastic differential equations (SDEs) with score matching has achieved unprecedented results in data generation. In this paper, we propose a novel fast high-quality generative modelling method based on high-order Langevin dynamics (HOLD) with score matching. This motive is proved by third-order Langevin dynamics. By augmenting the previous SDEs, e.g. variance exploding or variance preserving SDEs for single-data variable processes, HOLD can simultaneously model position, velocity, and acceleration, thereby improving the quality and speed of the data generation at the same time. HOLD is composed of one Ornstein-Uhlenbeck process and two Hamiltonians, which reduce the mixing time by two orders of magnitude. Empirical experiments for unconditional image generation on the public data set CIFAR-10 and CelebA-HQ show that the effect is significant in both Frechet inception distance (FID) and negative log-likelihood, and achieves the ......

MCM: Multi-condition Motion Synthesis Framework

图片

Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framework, termed MCM, based on a dual-branch structure composed of a main branch and a control branch. This framework effectively extends the applicability of the diffusion model, which is initially predicated solely on textual conditions, to auditory conditions. This extension encompasses both music-to-dance and co-speech HMS while preserving the intrinsic quality of motion and the capabilities for semantic association inherent in the original model. Furthermore, we propose the implementation of a Transformer-based diffusion model, designated as MWNet, as the main branch. This model adeptl......

Learn2Talk: 3D Talking Face Learns from 2D Talking Face

图片

Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we propose a learning framework named Learn2Talk, which can construct a better 3D talking face network by exploiting two expertise points from the field of 2D talking face. Firstly, inspired by the audio-video sync network, a 3D sync-lip expert model is devised for the pursuit of lip-sync between audio and 3D facial motion. Secondly, a teacher model selected from 2D talking face methods is used to guide the training of the audio-to-3D motions regression network to yield more 3D vertex accuracy. Extensive experiments show the advantages of the proposed framework in terms of lip-sync, vertex ......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/564543.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

优维应用级数字化架构管理:让企业运维天堑变通途

在优维科技的产品视角中,数字化架构管理就像是一门精妙的艺术,它将上层应用模型的业务概念以可视化的方式呈现出来,使得业务逻辑和流程变得更加直观、清晰。我们将这样的管理方式理解为“给企业搭起一座桥梁”——在这座桥梁的搭建过程中&…

Langchain入门到实战-第四弹

Langchain入门到实战 Langchain中的提示词官网地址Langchain概述Langchain的提示词用法更新计划 Langchain中的提示词 语言模型提示模板是预定义的生成语言模型提示的方法。模板可能包括指令、少样本示例、特定任务的上下文和问题。LangChain 提供了创建和处理提示模板的工具。…

BMR:基于Boostrapping多视图的虚假新闻检测

一、概述 文章提出了三种视图信息来表示一篇新闻:文本、图像结构、图像语义。然后设计了改进的多门混合专家系统(iMMoE)来进行信息融合。保留单模态信息来保证特征对新闻的保真性,增加的多模态信息能保证不同模态的一致性&#xf…

【python】如何通过python来发送短信

✨✨ 欢迎大家来到景天科技苑✨✨ 🎈🎈 养成好习惯,先赞后看哦~🎈🎈 🏆 作者简介:景天科技苑 🏆《头衔》:大厂架构师,华为云开发者社区专家博主,…

(2024,DiffEdit,掩码,潜在噪声校正)GenVideo:使用 T2I 扩散模型进行单样本目标图像和形状感知视频编辑

GenVideo: One-shot target-image and shape aware video editing using T2I diffusion models 公和众和号:EDPJ(进 Q 交流群:922230617 或加 VX:CV_EDPJ 进 V 交流群) 目录 0. 摘要 3. 方法 3.1. 对源视频进行微调…

用Python自动化操作PPT,看完这篇文章就够了!

1.PPT自动化能干什么?有什么优势? 它可以代替你自动制作PPT它可以减少你调整用于调整PPT格式的时间它可以让数据报告风格一致总之就是:它能提高你的工作效率!让你有更多时间去做其他事情! 2.使用win32com操作ppt 官…

【Linux基础】Linux基础概念

目录 前言 浅谈什么是文件? Linux下目录结构的认识及路径 目录结构 路径 家目录 什么是递归式的删除 重定向 输出重定向: 追加重定向: 输入重定向: 命令行管道 shell外壳 为什么需要shell外壳? shell外壳…

Jetpack Bluetooth蓝牙MODE,这个项目使用Jetpack Bluetooth库来实现我们用于开发的一些日常功能

Jetpack蓝牙演示,这个项目使用Jetpack Bluetooth库来实现我们用于开发的一些日常功能[搜索,连接,发现服务,蓝牙操作[读,写,通知]]。 AndroidX蓝牙是Jetpack库套件的新增功能。虽然目前处于阿尔法阶段&…

【华为OD笔试】2024D卷机考套题汇总【不断更新,限时免费】

有LeetCode算法/华为OD考试扣扣交流群可加 948025485 可上全网独家的 欧弟OJ系统 练习华子OD、大厂真题 绿色聊天软件戳 od1441了解算法冲刺训练(备注【CSDN】否则不通过) 文章目录 2024年4月17日(2024D卷)2024年4月18日&#xff…

【创作活动】2023年图灵奖

2023年图灵奖揭晓,你怎么看? 2023年图灵奖,最近刚刚颁给普林斯顿数学教授 Avi Wigderson!作为理论计算机科学领域的领军人物,他对于理解计算中的随机性和伪随机性的作用,作出了开创性贡献。提醒&#xff1…

前端学习<四>JavaScript基础——42-事件的传播和事件冒泡

DOM事件流 事件传播的三个阶段是:事件捕获、事件冒泡和目标。 事件捕获阶段:事件从祖先元素往子元素查找(DOM树结构),直到捕获到事件目标 target。在这个过程中,默认情况下,事件相应的监听函数…

CCF PTA 2023年11月C++卫星发射

【问题描述】 在 2050 年卫星发射技术已经得到极大发展,我国将援助 A 国建立远轨道卫星导航系统,该项目计划第 一个天发射一颗卫星;之后两天(第二天和第三天),每天发射两颗卫星;之后三天&#…

4.点云数据的配准

1.点云配准ICP(Iterative Closest Point)算法 点云配准的原理及ICP(Iterative Closest Point)算法原理参照博客【PCL】—— 点云配准ICP(Iterative Closest Point)算法_icp点云配准-CSDN博客。 (1)点云配准原理:三维扫描仪设备对目标物体一…

Spring Cloud Gateway详细介绍以及实现动态路由

一. 简介 Spring Cloud Gateway This project provides a libraries for building an API Gateway on top of Spring WebFlux or Spring WebMVC. Spring Cloud Gateway aims to provide a simple, yet effective way to route to APIs and provide cross cutting concerns to …

Mysql学习大纲

文章目录 整体大纲总结 整体大纲 大纲 MySQL在金融互联网行业的企业级安装部署mysql启动关闭原理和实战,及常见错误排查 花钱9.9 订阅了专栏MySQL字符集和校对规则史上最详细的Mysql用户权原理和实战,生产案例InnoDB引擎原理和实战,通俗易懂…

[C++][算法基础]求组合数(II)

给定 𝑛 组询问,每组询问给定两个整数 𝑎,𝑏,请你输出 的值。 输入格式 第一行包含整数 𝑛。 接下来 𝑛 行,每行包含一组 𝑎 和 𝑏。 输出格…

vue3左树的全选和反选

<el-input v-model"filterText" placeholder"" style"width: 48%"/><el-button type"primary" click"handleSearch" class"ml-2">查找</el-button><el-radio-group v-model"form.choic…

一文扫盲(5):实验室管理系统的界面设计

本次带来第5期&#xff1a;实验室管理系统的设计&#xff0c;从系统定义、功能模块、界面构成和设计着力点四个方面讲解&#xff0c;大千UI工场愿意持续和大家分享&#xff0c;欢迎关注、点赞、转发。 一、什么是实验室管理系统 实验室管理系统是一种用于管理和监控实验室运作…

【C++】友元--最全解析(友元是什么?我们应该如何理解友元?友元可以应用在那些场景?)

目录 一、前言 二、友元是什么&#xff1f; 三、友元的感性理解和分类 &#x1f95d;友元的感性理解 &#x1f34b;友元的三种分类 ✨友元 --- 全局函数 ✨友元 --- 成员函数 ✨友元 --- 类 四、友元函数的应用场景 &#x1f34d;操作符重载 :"<<" 与…

Nacos的介绍和使用Docker、MySQL持久化挂载安装

文章目录 Nacos的介绍和使用Docker、MySQL持久化挂载安装一、Nacos的介绍二、使用Docker和MySQL进行持久化安装1、选择想要使用的MySQL服务器&#xff0c;创建一个数据库nacos-config&#xff0c;然后运行下面sql2、在linux下的opt文件夹下创建 /opt/nacos/data文件夹 和 /opt/…