LD-Pruner、EdgeFusion(On-Device T2I)、FreeDiff、TextCenGen、MemLLM

本文首发于公众号:机器感知

https://mp.weixin.qq.com/s/KiyNfwYWU-wBiCO-hE9qkA

图片

The devil is in the object boundary: towards annotation-free instance  segmentation using Foundation Models

图片

Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, \textit{i.e.}, these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose $\textbf{Zip}$ which $\textbf{Z}$ips up CL$\textbf{ip}$ and SAM in a novel classification-first-then-discovery pipeline, enabling annotation-free, complex-scene-capable, open-vocab......

LD-Pruner: Efficient Pruning of Latent Diffusion Models using  Task-Agnostic Insights

图片

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster co......

EdgeFusion: On-Device Text-to-Image Generation

The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches, we uniquely start with a compact SD variant, BK-SDM. We observe that directly applying LCM to BK-SDM with commonly used crawled datasets yields unsatisfactory results. It leads us to develop two strategies: (1) leveraging high-quality image-text pairs from leading generative models and (2) designing an advanced distillation process tailored for LCM. Through our thorough exploration of quantization, profiling, and on-device deployment, we achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices......

SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up

图片

Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which is extremely efficient in inference time. Our method significantly enhances inference efficiency by investigating and utilizing a skill-localized subnetwork in a language model. Surprisingly, our method improves the inference speed up to 160% while pruning 52% of the parameters. Furthermore, we demonstrate that our method is applicable across various transformer-based architectures, thereby confirming its practicality and scalability. ......

TriForce: Lossless Acceleration of Long Sequence Generation with  Hierarchical Speculative Decoding

图片

With large language models (LLMs) widely deployed in long content generation recently, there has emerged an increasing demand for efficient long-sequence inference support. However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Due to the auto-regressive nature of LLMs, the entire KV cache will be loaded for every generated token, resulting in low utilization of computational cores and high latency. While various compression methods for KV cache have been proposed to alleviate this issue, they suffer from degradation in generation quality. We introduce TriForce, a hierarchical speculative decoding system that is scalable to long sequence generation. This approach leverages the original model weights and dynamic sparse KV cache via retrieval as a draft model, which serves as an intermediate layer in the hierarchy and is further speculated by a smaller model to reduce its......

FreeDiff: Progressive Frequency Truncation for Image Editing with  Diffusion Models

图片

Precise image editing with text-to-image models has attracted increasing interest due to their remarkable generative capabilities and user-friendly nature. However, such attempts face the pivotal challenge of misalignment between the intended precise editing target regions and the broader area impacted by the guidance in practice. Despite excellent methods leveraging attention mechanisms that have been developed to refine the editing guidance, these approaches necessitate modifications through complex network architecture and are limited to specific editing tasks. In this work, we re-examine the diffusion process and misalignment problem from a frequency perspective, revealing that, due to the power law of natural images and the decaying noise schedule, the denoising network primarily recovers low-frequency image components during the earlier timesteps and thus brings excessive low-frequency signals for editing. Leveraging this insight, we introduce a novel fine-tuning free a......

TextCenGen: Attention-Guided Text-Centric Background Adaptation for  Text-to-Image Generation

图片

Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text-centric design and visual harmony generation. Our method employs force-directed attention guidance in T2I models to generate images that strategically reserve whitespace for pre-defined text areas, even for text or icons at the golden ratio. Observing how cross-attention maps affect object placement, we detect and repel conflicting objects using a force-directed graph approach, combined with a Spatial Excluding Cross-Attention Constraint for smooth attention in whitespace areas. As a novel task in graphic design, experiments indicate that TextCenGen outperforms existing methods with......

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale  Approach

图片

The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we introduce a unidirectional causal attention mechanism between the novel prompts, learned with ......

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

图片

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augmented Generation (RAG) $\unicode{x2013}$ though non-parametric $\unicode{x2013}$ has its own limitations: it lacks structure, complicates interpretability and makes it hard to effectively manage stored knowledge. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our......

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

图片

Multi-head self-attention (MSA) is a key component of Vision Transformers (ViTs), which have achieved great success in various vision tasks. However, their high computational cost and memory footprint hinder their deployment on resource-constrained devices. Conventional pruning approaches can only compress and accelerate the MSA module using head pruning, although the head is not an atomic unit. To address this issue, we propose a novel graph-aware neuron-level pruning method, Structured Neuron-level Pruning (SNP). SNP prunes neurons with less informative attention scores and eliminates redundancy among heads. Specifically, it prunes graphically connected query and key layers having the least informative attention scores while preserving the overall attention scores. Value layers, which can be pruned independently, are pruned to eliminate inter-head redundancy. Our proposed method effectively compresses and accelerates Transformer-based models for both edge devices and server......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/557839.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Windows系统下安装paddle

开始使用_飞桨-源于产业实践的开源深度学习平台 (paddlepaddle.org.cn) 命令行下: python -m pip install --upgrade pip --user python -m pip install paddlepaddle2.6.1 -i https://pypi.tuna.tsinghua.edu.cn/simple 报异常 ERROR: Could not install packa…

Jmeter 测试Dubbo接口-实例

1、Dubbo插件准备 ①把jmeter-plugins-dubbo-2.7.4.1-jar-with-dependencies.jar包放在D:\apache-jmeter-5.5\lib\ext目录 ②重新打开Jmeter客户端 在线程组-添加-取样器-dubbo simple,添加dubbo接口请求 2、Jmeter测试lottery接口 ①配置zookeeper参数 由于dub…

windows和虚拟机互传文件

在虚拟机中设置共享文件夹 操作方法:打开VMware–>虚拟机–>设置–>选项–>共享文件夹(见下图),大家在共享文件夹当中就可以把Windows当中的D盘或者其它盘共享到虚拟机中。比如我就是将D盘和E盘共享到了虚拟机中。 共…

【Vue】实现显示输入框字符长度

<div style"float: right; margin-right: 10px"><el-popover placement"top-start" width"200" trigger"hover" :content"当前输入的内容字节长度为&#xff1a; this.byteLength &#xff0c;剩余可输入的字节长度和最…

学校管网的仿写

工字形布局完成 效果 代码部分 在这里插入代码片 <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><meta http-equiv"X-UA-Compatible" content"IEedge"><meta name"viewport…

某书Frida检测绕过记录

某书Frida检测绕过记录 前言Frida启动APPHook android_dlopen_ext查看加载的库分析libmsaoaidsec.soFrida检测绕过后记 前言 本来想要分析请求参数加密过程&#xff0c;结果发现APP做了Frida检测&#xff0c;于是记录一下绕过姿势(暴力但有用) Frida版本&#xff1a;16.2.1 AP…

ctfhub-ssrf(2)

1.URL Bypass 题目提示:请求的URL中必须包含http://notfound.ctfhub.com&#xff0c;来尝试利用URL的一些特殊地方绕过这个限制吧 打开环境发现URL中必须包含http://notfound.ctfhub.com&#xff0c;先按照之前的经验查看127.0.0.1/flag.php,发现没什么反应&#xff0c;按照题…

vue和react通用后台管理系统权限控制方案

1. 介绍 在任何企业级应用中&#xff0c;尤其是后台管理系统&#xff0c;权限控制是一个至关重要的环节。它确保了系统资源的安全性&#xff0c;防止非法访问和操作&#xff0c;保障业务流程的正常进行。本文件将详细解析后台管理系统中的权限控制机制及其实施策略。 那么权限…

Linux:zabbix自定义监控项(6)

本章去做一个监控ftp服务是否正常启动的监控项目 大概就是先创建一个模板&#xff0c;我们把要做的东西放入这个模板&#xff0c;然后把这个模板应用到某个监控主机上就可以生效 1.准备监控项脚本 其中的核心就是&#xff0c;通过脚本去判断一个东西的数值&#xff0c;通过这个…

算法刷题记录2

4.图 4.1.被围绕的区域 思路&#xff1a;图中只有与边界上联通的O才不算是被X包围。因此本题就是从边界上的O开始递归&#xff0c;找与边界O联通的O&#xff0c;并标记为#&#xff08;代表已遍历&#xff09;&#xff0c;最后图中剩下的O就是&#xff1a;被X包围的O。图中所有…

【Linux】进程和计划任务

目录 一、进程介绍 1.1 进程与线程的定义 1.1.1 进程(Process)** 1.1.2 线程(Thread)** 1.1.3 进程与线程的区别 1.2 进程的特征 1.3 进程状态 1.3.1 进程的基本状态 1.3.2 进程更多的状态 1.4 进程的优先级 1.5 进程间通信 1.6 进程的分类* 二、进程管理 2.1 查看…

java核心类

一,String字符串 1.1,String字符串是引用类型,且不可变 String str1 "Hello";String str2 str1.concat(" World"); // 使用concat方法连接字符串&#xff0c;返回一个新的字符串对象System.out.println(str1); // 输出&#xff1a;Hello&#xff0c;原始…

[大模型]InternLM2-7B-chat langchain 接入

InternLM2-7B-chat langchain 接入 InternLM2 &#xff0c;即书生浦语大模型第二代&#xff0c;开源了面向实用场景的70亿参数基础模型与对话模型 &#xff08;InternLM2-Chat-7B&#xff09;。模型具有以下特点&#xff1a; 有效支持20万字超长上下文&#xff1a;模型在20万…

Hadoop3:大数据的基本介绍

一、什么是大数据 1、大数据的4v特点 Volume&#xff08;大量&#xff09; Velocity&#xff08;高速&#xff09; Variety&#xff08;多样&#xff09; Value&#xff08;低价值密度&#xff09; 2、大数据部门间的工作岗位 第三部分&#xff0c;其实就是JavaWeb 二、…

使用TomCat写Film前后端项目04.14

使用TomCat写Film前后端项目源文件0414-CSDN博客 实现功能&#xff1a; 得到数据库所有电影数据在首页显示出来 添加 删除 修改 点击修改&#xff0c;获取编号id&#xff0c;传入到根据id编号查询数据的控制器转发数据到 修改的jsp页面。 获取修改数据传入到根据id修改数据的控…

2024 CKA | 基础操作教程(十五)

题目内容 设置配置环境&#xff1a; [candidatenode-1] $ kubectl config use-context xk8s Task 您必须从 master01 主机执行所需的 etcdctl 命令。 首先&#xff0c;为运行在 https://127.0.0.1:2379 上的现有 etcd 实例创建快照并将快照保存到 /var/lib/backup/etcd-sn…

【QT进阶】Qt Web混合编程之CMake VS2019编译并使用QCefView(图文并茂超详细版本)

往期回顾 【QT进阶】Qt Web混合编程之CEF、QCefView简单介绍-CSDN博客 【QT进阶】Qt Web混合编程之VS2019 CEF的编译与使用&#xff08;图文并茂超详细介绍&#xff09;-CSDN博客【QT进阶】Qt Web混合编程之QWebEngineView基本用法-CSDN博客【QT进阶】Qt Web混合编程之VS2019 C…

【C语言——动态内存管理】

一.为什么要有动态内存分配 通过前面的学习我们已经掌握了使用变量和数组来进行内存的开辟。 上面所说的这两种内存的开辟方式有两个特点&#xff1a; 空间开辟的大小是固定的。数组在生命的时候&#xff0c;必须指定数组的长度&#xff0c;数组空间一旦确定了大小就不能再调整…

逆滤波器的推导与实现

设滤波器为&#xff0c;逆滤波器为 根据滤波器和逆滤波器的定义 对上式做傅里叶变换 对上式做逆傅里叶变换可得&#xff0c; 好了&#xff0c;逆滤波器的公式推导完了&#xff0c;但是实际计算时大多数时候这样是算不出来的&#xff0c;除非像扫频或粉噪这样的全频带信号才行&…

C盘越用越大?教你如何科学管理C盘空间

前言&#xff1a; 如图&#xff0c;左边是我多开的E5电脑&#xff0c;装的是LTSC2019_210707F多开封装版&#xff0c;C盘占用8.5GB&#xff0c;右边是我平常打游戏写代码的电脑&#xff0c;装的是Win11 22H2&#xff0c;C盘占用30GB。两台电脑都关闭了休眠&#xff0c;C盘的虚拟…