PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

本文首发于公众号:机器感知

PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

图片

PTQ4SAM: Post-Training Quantization for Segment Anything

图片

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax th......

AniTalker: Animate Vivid and Diverse Talking Faces through  Identity-Decoupled Facial Motion Encoding

图片

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the n......

Matten: Video Generation with Mamba-Attention

图片

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten. ......

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

图片

Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD......

IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

图片

One limitation of existing Transformer-based models is that they cannot handle very long sequences as input since their self-attention operations exhibit quadratic time and space complexity. This problem becomes especially acute when Transformers are deployed on hardware platforms equipped only with CPUs. To address this issue, we propose a novel method for accelerating self-attention at inference time that works with pretrained Transformer models out-of-the-box without requiring retraining. We experiment using our method to accelerate various long-sequence Transformers, including a leading LLaMA 2-based LLM, on various benchmarks and demonstrate a greater speedup of 2.73x - 7.63x while retaining 98.6% - 99.6% of the accuracy of the original pretrained models. The code is available on our project website at https://yuzhenmao.github.io/IceFormer/. ......

Efficient Text-driven Motion Generation via Latent Consistency Training

图片

Motion diffusion models have recently proven successful for text-driven human motion generation. Despite their excellent generation performance, they are challenging to infer in real time due to the multi-step sampling mechanism that involves tens or hundreds of repeat function evaluation iterations. To this end, we investigate a motion latent consistency Training (MLCT) for motion generation to alleviate the computation and time consumption during iteration inference. It applies diffusion pipelines to low-dimensional motion latent spaces to mitigate the computational burden of each function evaluation. Explaining the diffusion process with probabilistic flow ordinary differential equation (PF-ODE) theory, the MLCT allows extremely few steps infer between the prior distribution to the motion latent representation distribution via maintaining consistency of the outputs over the trajectory of PF-ODE. Especially, we introduce a quantization constraint to optimize motion latent r......

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

图片

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the ......

From Generalization Analysis to Optimization Designs for State Space  Models

图片

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results. ......

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

图片

Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and we......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/599874.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

华为机考入门python3--(23)牛客23- 删除字符串中出现次数最少的字符

分类:字符串 知识点: 访问字典中keychar的值,不存在则返回0 my_dict.get(char, 0) 字典的所有值 my_dict.value() 列表中的最小值 min(my_list) 题目来自【牛客】 import sysdef delete_min_freq_char(s):# 计算字母出现的频次…

套管外径测量仪 多尺寸型号 规格全可定制

套管(bushing)是一种将带电导体引入电气设备或穿过墙壁的一种绝缘装置。前者称为电器套管,后者称为穿墙套管。套管通常用在建筑地下室,是用来保护管道或者方便管道安装的铁圈。套管的分类有刚性套管、柔性防水套管、钢管套管及铁皮…

鸿蒙OpenHarmony实战开发-MiniCanvas

介绍 基于OpenHarmony的Cavas组件封装了一版极简操作的MiniCanvas,屏蔽了原有Canvas内部复杂的调用流程,支持一个API就可以实现相应的绘制能力,该库还在继续完善中,也欢迎PR。 使用说明 1.添加MiniCanvas依赖 在项目entry目录…

07 - 步骤 javaScript代码

简介 JavaScript 代码是通过 JavaScript 脚本步骤来执行 JavaScript 脚本的一种方式。这允许用户在 Kettle 的数据流程中使用 JavaScript 编写自定义的脚本逻辑,用于数据处理、转换、计算等操作。 使用 场景 我需要在数据流加一个字段 createTime 当前时间&…

【嵌入式DIY实例】-基于GSM的远程灌溉系统

基于GSM的远程灌溉系统 文章目录 基于GSM的远程灌溉系统1、硬件准备与接线2、软件准备3、代码实现本文将详细介绍如何搭建通过使用手机实现对灌溉系统的远程控制。该系统利用全球移动通信系统(GSM)技术在灌溉系统和移动电话之间建立通信。该系统建立在流行的开源微控制器平台…

Linux 进程间通信之共享内存

💓博主CSDN主页:麻辣韭菜💓   ⏩专栏分类:Linux知识分享⏪   🚚代码仓库:Linux代码练习🚚   🌹关注我🫵带你学习更多Linux知识   🔝 ​ 目录 ​编辑​ 前言 共享内存直接原理…

使用双指针解决问题题集(二)

1. 有效三角形的个数 给定一个包含非负整数的数组 nums ,返回其中可以组成三角形三条边的三元组个数。 示例 1: 输入: nums [2,2,3,4] 输出: 3 解释:有效的组合是: 2,3,4 (使用第一个 2) 2,3,4 (使用第二个 2) 2,2,3 示例 2: 输入: nums [4,2,3,4] 输出: 4 题解&a…

zabbix监控方式(zabbix-trapper)

中文:zabbix采集器,即zabbix sender 。 Zabbix-Trapper 监控方式可以一次批量发送数据给Zabbix Server,与主动模式不同,Zabbix-Trapper 可以让用户控制数据的发送,而不用Zabbix-Agent进程控制,这意味着可以…

Angular中的管道(Pipe)

Angular中的管道(Pipe) 文章目录 Angular中的管道(Pipe)前言一、内置管道1. date管道格式化日期2. currency管道格式化货币3. uppercase和lowercase管道转换字符串大小写4. 小数位数5. JavaScript 对象序列化6. slice7. 管道链 二、自定义管道 前言 Angular中的管道&#xff0…

力扣刷题--数组--第二天

今天仍然做二分查找相关的题目。先来回顾一下二分查找的方法和使用的条件。二分查找是在数组中查找目标值的一种方法,通过边界索引确定中间索引,判断中间索引处的元素值和目标值的大小,来不断缩小查找区间。使用二分查找有如下一些限制&#…

ASP.NET通用作业批改系统设计

摘  要 该系统采用B/S结构,以浏览器方式登陆系统,用ASP.NET作为开发语言,数据库则使用Microsoft SQL Server 2000实现。《通用作业批改系统》包括了学生子系统、教师子系统、管理员子系统三大模块,该系统主要完成学生&#xff…

LibTorch入坑记--续2

一、安装faiss 我的faiss&#xff0c;用的是曾经安装过的 pip install faiss-gpu1.7 当时搞得环境名称是pni 二、配置环境 三、例子代码 #include <faiss/IndexFlat.h> #include <faiss/Index.h> #include <faiss/VectorTransform.h> #include <faiss/…

Mybatis进阶4-权限管理

权限管理 1.权限 //相当于 职责 2.用户 //相当于 职员&#xff08;职员就职于一个职位&#xff09; 3.角色 //相当于 职位&#xff08;有多个职责&#xff09; 权限管理基础表&#xff1a;权限表&#xff0c;用户表&#xff0c;角色表 问题1&#xff1a;…

SVM直观理解

https://tangshusen.me/2018/10/27/SVM/ https://www.bilibili.com/video/BV16T4y1y7qj/?spm_id_from333.337.search-card.all.click&vd_source8272bd48fee17396a4a1746c256ab0ae SVM是什么? 先来看看维基百科上对SVM的定义: 支持向量机&#xff08;英语&#xff1a;su…

根据最近拒包项目总结,详细讲解Google最新政策(上)

关于占比最多的移动垃圾软件拒审问题 移动垃圾软件(Mobile Unwanted Software)特征表现1> 具有欺骗性,承诺其无法实现的价值主张。2> 诱骗用户进行安装,或搭载在用户安装的其他程序上。3> 不向用户告知其所有主要功能和重要功能。4> 以非预期方式影响用户的系统…

Error Code: 1449. The user specified as a definer (‘admin‘@‘%‘) does not exist

前言 在进行MySQL数据库迁移或存储过程部署时&#xff0c;您可能会遇到错误 [Err] 1449 - The user specified as a definer (admin%) does not exist。这篇文章将为您提供一个详细的解决方案&#xff0c;帮助您顺利解决这一问题。 错误背景 此错误通常发生在尝试执行一个存…

Jenkins集成Kubernetes 部署springboot项目

文章目录 准备部署的yml文件Harbor私服配置测试使用效果Jenkins远程调用参考文章 准备部署的yml文件 apiVersion: apps/v1 kind: Deployment metadata:namespace: testname: pipelinelabels:app: pipeline spec:replicas: 2selector:matchLabels:app: pipelinetemplate:metada…

机器学习算法--朴素贝叶斯(Naive Bayes)

一、实验环境 1. python3.7 2. numpy > 1.16.4 3. sklearn > 0.23.1 二、朴素贝叶斯的介绍 朴素贝叶斯算法&#xff08;Naive Bayes, NB) 是应用最为广泛的分类算法之一。它是基于贝叶斯定义和特征条件独立假设的分类器方法。NB模型所需估计的参数很少&#xff0c;对缺…

【微服务】网关(详细知识以及登录验证)

微服务网关 网关网关路由快速入门路由属性 路由断言网关登录校验自定义过滤器实现登录校验网关传递用户OpenFeign传递用户 网关 网络的关口&#xff0c;负责请求的路由&#xff0c;转发&#xff0c;身份校验 当我们把一个单体项目分成多个微服务并部署在多台服务器中&#xff…

DDR4 新功能介绍

DDR4(第四代双倍数据率同步动态随机存取内存)相较于其前代DDR3,引入了一些新的功能和改进,这些新功能有助于提高内存的性能、降低功耗以及增强系统的可靠性,包括VPP、DBI(Data Bus Inversion,数据总线翻转)和DMI(与LPDDR4相关)。以下是对这些功能的简要说明: 更高的…