Imagine Flash、StyleMamba 、FlexControl、Multi-Scene T2V、TexControl

本文首发于公众号:机器感知

Imagine Flash、StyleMamba 、FlexControl、Multi-Scene T2V、TexControl

图片

You Only Cache Once: Decoder-Decoder Architectures for Language Models

图片

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO only caches once. The design substantially reduces GPU memory demands, yet retains global attention capability. Additionally, the computation flow enables prefilling to early exit without changing the final output, thereby significantly speeding up the prefill stage. Experimental results demonstrate that YOCO achieves favorable performance compared to Transformer in various settings of scaling up model size and number of training tokens. We also extend YOCO to 1M context length with near-perfect needle retrieval accuracy. The profiling results show that YOCO improves inference memory, p......

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

图片

Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different den......

Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation

图片

Diffusion models are a powerful generative framework, but come with expensive inference. Existing acceleration methods often compromise image quality or fail under complex conditioning when operating in an extremely low-step regime. In this work, we propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps. Our approach comprises three key components: (i) Backward Distillation, which mitigates training-inference discrepancies by calibrating the student on its own backward trajectory; (ii) Shifted Reconstruction Loss that dynamically adapts knowledge transfer based on the current time step; and (iii) Noise Correction, an inference-time technique that enhances sample quality by addressing singularities in noise prediction. Through extensive experiments, we demonstrate that our method outperforms existing competitors in quantitative metrics and human evaluations. Remarkably, it achieves performance comparable......

Reviewing Intelligent Cinematography: AI research for camera-based video production

图片

This paper offers a comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Considering the breadth of computer vision research and the lack of review papers tied to intelligent cinematography (IC), this review introduces a holistic view of the IC landscape while providing the technical insight for experts across across disciplines. We preface the main discussion with technical background on generative AI, object detection, automated camera calibration and 3-D content acquisition, and link explanatory articles to assist non-technical readers. The main discussion categorizes work by four production types: General Production, Virtual Production, Live Production and Aerial Production. Note that for Virtual Production we do not discuss research relating to virtual content acquisition, including work on automated video generation, like Stable Di......

StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer

图片

We present StyleMamba, an efficient image style transfer framework that translates text prompts into corresponding visual styles while preserving the content integrity of the original images. Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources. To speed up the process, we propose a conditional State Space Model for Efficient Text-driven Image Style Transfer, dubbed StyleMamba, that sequentially aligns the image features to the target text prompts. To enhance the local and global style consistency between text and image, we propose masked and second-order directional losses to optimize the stylization direction to significantly reduce the training iterations by 5 times and the inference time by 3 times. Extensive experiments and qualitative evaluation confirm the robust and superior stylization performance of our methods compared to the existing baselines.......

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

图片

Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps. Nevertheless, current controllable T2I methods commonly face challenges related to efficiency and faithfulness, especially when conditioning on multiple inputs from either the same or diverse modalities. In this paper, we propose a novel Flexible and Efficient method, FlexEControl, for controllable T2I generation. At the core of FlexEControl is a unique weight decomposition strategy, which allows for streamlined integration of various input types. This approach not only enhances the faithfulness of the generated image to the control, but also significantly reduces the computational overhead typically associated with multimodal conditioning. Our approach achieves a reduction of 41% in trainable parameters and 30% in memory usage compared with Uni-ControlNet. Moreover, it doubles data efficiency and can flexibly generate imag......

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

图片

Recent advances in diffusion-based generative modeling have led to the development of text-to-video (T2V) models that can generate high-quality videos conditioned on a text prompt. Most of these T2V models often produce single-scene video clips that depict an entity performing a particular action (e.g., `a red panda climbing a tree'). However, it is pertinent to generate multi-scene videos since they are ubiquitous in the real-world (e.g., `a red panda climbing a tree' followed by `the red panda sleeps on the top of the tree'). To generate multi-scene videos from the pretrained T2V model, we introduce Time-Aligned Captions (TALC) framework. Specifically, we enhance the text-conditioning mechanism in the T2V architecture to recognize the temporal alignment between the video scenes and scene descriptions. For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e.g., `a red pan......

TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model

图片

Deep learning-based sketch-to-clothing image generation provides the initial designs and inspiration in the fashion design processes. However, clothing generation from freehand drawing is challenging due to the sparse and ambiguous information from the drawn sketches. The current generation models may have difficulty generating detailed texture information. In this work, we propose TexControl, a sketch-based fashion generation framework that uses a two-stage pipeline to generate the fashion image corresponding to the sketch input. First, we adopt ControlNet to generate the fashion image from sketch and keep the image outline stable. Then, we use an image-to-image method to optimize the detailed textures of the generated images and obtain the final results. The evaluation results show that TexControl can generate fashion images with high-quality texture as fine-grained image generation.......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/608442.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

QT---day4事件

1、思维导图 2、 头文件 #ifndef MYWIDGET_H #define MYWIDGET_H #include <QWidget> #include<QIcon> //图标类 #include<QLabel> //标签类 #include<QMovie> //动图类 #include<QLineEdit> //行编辑器类 #include<QPushButton> //按钮…

AJAX知识点(前后端交互技术)

原生AJAX AJAX全称为Asynchronous JavaScript And XML,就是异步的JS和XML&#xff0c;通过AJAX可以在浏览器中向服务器发送异步请求&#xff0c;最大的优势&#xff1a;无需刷新就可获取数据。 AJAX不是新的编程语言&#xff0c;而是一种将现有的标准组合在一起使用的新方式 …

【进程等待】阻塞等待 | options非阻塞等待

目录 waitpid 阻塞等待 options&非阻塞等待 pid_t返回值 阻塞等待VS非阻塞等待 waitpid 回顾上篇&#xff1a; pid_ t waitpid(pid_t pid, int *status, int options); 返回值&#xff1a; 当正常返回的时候waitpid返回收集到的子进程的进程ID&#xff1b;如果设置了…

C++容器之vector类

目录 1.vector的介绍及使用1.1vector的介绍1.2vector的使用1.2.1 vector的定义1.2.2 vector iterator 的使用1.2.3 vector 空间增长问题1.2.4 vector 增删查改1.2.5vector 迭代器失效问题1.2.6 vector 在OJ中的使用。 2.vector深度剖析及模拟实现2.1 std::vector的核心框架接口…

金三银四面试题(二十五):策略模式知多少?

什么是策略模式 策略模式&#xff08;Strategy Pattern&#xff09;是一种行为型设计模式&#xff0c;旨在定义一系列算法&#xff0c;将每个算法封装到一个独立的类中&#xff0c;使它们可以互换。策略模式让算法的变化独立于使用它们的客户端&#xff0c;使得客户端可以根据…

车载测试系列:入行车载测试分享

车载测试前景如何&#xff1f; 软件定义汽车时代的发展趋势&#xff0c;随着控制器自主开发力度的加强&#xff0c;作为V流程中必备环节&#xff0c;车载测试工程师岗位需求会越来越多&#xff1b;控制器集成化&#xff0c;功能集成程度越来越高&#xff0c;对于测试工程师的知…

3. 初探MPI——(非阻塞)点对点通信

系列文章目录 初探MPI——MPI简介初探MPI——&#xff08;阻塞&#xff09;点对点通信初探MPI——&#xff08;非阻塞&#xff09;点对点通信初探MPI——集体通信 文章目录 系列文章目录前言一、Non-blocking communications1.1 Block version1.2 Non-blocking version 二、准…

思维导图软件哪个好?盘点这5款好用的工具!

思维导图作为一种有效的思维工具&#xff0c;在日常生活和工作中扮演着越来越重要的角色。无论是学习、工作规划&#xff0c;还是项目管理&#xff0c;思维导图都能帮助我们更好地组织思路&#xff0c;提升工作效率。然而&#xff0c;市面上众多的思维导图软件让人眼花缭乱&…

软件系统工程建设全套资料(交付清单)

软件全套精华资料包清单部分文件列表&#xff1a; 工作安排任务书&#xff0c;可行性分析报告&#xff0c;立项申请审批表&#xff0c;产品需求规格说明书&#xff0c;需求调研计划&#xff0c;用户需求调查单&#xff0c;用户需求说明书&#xff0c;概要设计说明书&#xff0c…

C++类和对象(4)

目录 1.初始化列表 2.单参数里面的隐式类型转换 3.多参数的隐式类型转换 4.匿名对象 1.初始化列表 &#xff08;1&#xff09;首先看一下初始化列表具体是什么&#xff1f; 这个就是初始化列表的具体形式&#xff0c;对&#xff0c;你没有看错&#xff0c;这个初始化列表里…

python:画折线图

import pandas as pd import matplotlib.pyplot as plt from matplotlib.font_manager import FontProperties# 设置新宋体字体的路径 font_path D:/reportlab/simsun/simsun.ttf# 加载新宋体字体 prop FontProperties(fnamefont_path)""" # 读取 xlsx 文件 d…

【投资必看】充电桩加盟合作哪家好,充电桩厂家合作模式一般有哪些?

随着新能源汽车行业的蓬勃发展&#xff0c;充电桩作为关键的基础设施&#xff0c;其市场需求日益增长。对于有意进入这一行业的投资者来说&#xff0c;了解和选择适合的合作模式至关重要。充电桩厂家的合作模式一般有哪些&#xff0c;本文将从设备销售和投资运营两个维度进行讨…

容灾演练双月报|郑大一附院数据级容灾演练切换

了解更多灾备行业动态 守护数字化时代业务连续 目录 CONTENTS 01 灾备法规政策 02 热点安全事件 03 容灾演练典型案例 01 灾备法规政策 3月19日&#xff0c;工信部发布《工业和信息化部办公厅关于做好2024年信息通信业安全生产和网络运行安全工作的通知》。明确提出“…

官宣:vAsterNOS正式发布!开放网络操作系统免费试用!

近期&#xff0c;vAsterNOS&#xff08;设备模拟器&#xff09;正式发布&#xff0c;可以满足用户快速了解 AsterNOS、体验实际操作、搭建模拟网络的需求&#xff0c;可运行在GNS3、EVE-NG等网络虚拟软件中。 AsterNOS 网络操作系统是星融元为人工智能、机器学习、高性能计算、…

java培训班还值得去培训吗?

请大家关注我的公众号&#xff1a;老胡聊Java 1 应届生或者在校生&#xff0c;如果感觉有必要&#xff0c;可以去提升下技术&#xff0c;因为应届生或在校生找工作时&#xff0c;未必要提升真实项目经验&#xff0c;所以用应届生身份学到的spring boot等java技术背面试题&#…

《二十三》Qt 简单小项目---视频播放器

QT 使用QMediaPlayer实现的简易视频播放器 效果如下&#xff1a; 功能点 播放指定视频点击屏幕暂停/播放开始/暂停/重置视频拖拽到指定位置播放 类介绍 需要在配置文件中加入Multimedia, MultimediaWidgets这俩个库。 Multimedia&#xff1a;提供了一套用于处理音频、视频…

如何开启深色模式【攻略】

如何开启深色模式【攻略】 前言版权推荐如何开启深色模式介绍手机系统手机微信手机QQ手机快手手机抖音 电脑系统电脑微信电脑QQ电脑WPS电脑浏览器 最后 前言 2024-5-9 20:48:21 深色模式给人以一种高级感。 本文介绍一些常用软件深色模式的开启 以下内容源自《【攻略】》 仅…

基于Spring Boot的酒店管理系统设计与实现

基于Spring Boot的酒店管理系统设计与实现 开发语言&#xff1a;Java 框架&#xff1a;springboot JDK版本&#xff1a;JDK1.8 数据库工具&#xff1a;Navicat11 开发软件&#xff1a;eclipse/myeclipse/idea 系统部分展示 系统首页界面图&#xff0c;在系统首页可以查看首页…

【数据结构-二叉搜索树的增删查改】

&#x1f308;个人主页&#xff1a;努力学编程’ ⛅个人推荐&#xff1a;基于java提供的ArrayList实现的扑克牌游戏 |C贪吃蛇详解 ⚡学好数据结构&#xff0c;刷题刻不容缓&#xff1a;点击一起刷题 &#x1f319;心灵鸡汤&#xff1a;总有人要赢&#xff0c;为什么不能是我呢 …

python-类和对象

1、设计一个 Circle类来表示圆,这个类包含圆的半径以及求面积和周长的函数。再使用这个类创建半径为1~10的圆,并计算出相应的面积和周长。 &#xff08;1&#xff09;源代码&#xff1a; import math class Circle: def __init__(self, r): self.r r #面积 def area(self): r…