【深度学习】sdwebui A1111 加速方案对比，xformers vs Flash Attention 2

文章目录

资料支撑
资料结论
sdwebui A1111 速度对比测试
sdxl
- xformers 用contorlnet sdxl
- sdpa（--opt-sdp-no-mem-attention）用contorlnet sdxl
- sdpa(--opt-sdp-attention) 用contorlnet sdxl
- 不用xformers或者sdpa ,用contorlnet sdxl
- 不用xformers或者sdpa 纯生图 sdxl
- 用sdpa 纯生图不用contorlnet 生图时间
sd1.5
- 不用xformers或者sdpa sd1.5+hirefix2倍纯生图512
- 用sdpa sd1.5+hirefix2倍纯生图512
- 不用xformers或者sdpa sd1.5 纯生图512
- 用sdpa sd1.5 纯生图512
- 其他速度
结论

资料支撑

xformers中可以使用Flashv2
https://github.com/facebookresearch/xformers/issues/795
https://github.com/vllm-project/vllm/issues/485
https://github.com/facebookresearch/xformers/issues/832

PyTorch 支持 Flash Attention 2。
Flash Attention 2 是 Flash Attention 的改进版本，它提供了更高的性能和更好的并行性。它于 2023 年 11 月发布，并被集成到 PyTorch 2.2 中。
PyTorch 2.2 于 2024 年 2 月发布，它包含以下与 Flash Attention 2 相关的更新：

将 Flash Attention 内核更新到 v2 版本
支持 aarch64 平台上的 Flash Attention 2
修复了 Flash Attention 2 中的一些已知问题
要使用 Flash Attention 2，您需要安装 PyTorch 2.2 或更高版本。您还可以使用 torch.nn.functional.flash_attn() 函数显式调用 Flash Attention 2。
以下是一些有关如何使用 Flash Attention 2 的资源：
PyTorch 文档：https://discuss.pytorch.org/t/flash-attention/174955
Flash Attention 2 论文：https://arxiv.org/abs/2307.08691
Flash Attention 2 GitHub 存储库：https://github.com/Dao-AILab/flash-attention
https://github.com/pytorch/pytorch/pull/105602
更新日志：https://pytorch.org/blog/pytorch2-2/
https://pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html
Triton 内核
https://pytorch.org/blog/pytorch2-3/

SDPA vs. xformers
https://github.com/huggingface/diffusers/issues/3793
F.scaled_dot_product_attention() 是pytorch的SDPA
xformers.ops.memory_efficient_attention是xformer的对应算子
https://github.com/lucidrains/memory-efficient-attention-pytorch/blob/main/memory_efficient_attention_pytorch/memory_efficient_attention.py

https://github.com/facebookresearch/xformers/issues/950
在这里插入图片描述

sdwebui支持SDP：
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/8367
https://qq742971636.blog.csdn.net/article/details/139772822
sdp 注意力机制与 xformers 相当，甚至略胜一筹：
[图片]

pytorch 2.0的注意力是Flash Attention 1
https://pytorch.org/docs/2.0/generated/torch.nn.functional.scaled_dot_product_attention.html
pytorch 2.2的注意力是Flash Attention 2
https://pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html

资料结论

pytorch2.2版本的 F.scaled_dot_product_attention() 即是Flash Attention 2

xformers 中新版本已经有类似实现。

sdwebui A1111 速度对比测试

参数含义看这里：
https://qq742971636.blog.csdn.net/article/details/139772822

使用ipadapter contorlnet

pytorch2.3+xformers 0.25

25轮

In a snowy mountain range, the young man is dressed in winter attire, facing the camera with a determined gaze. He sports a thick wool coat, knit hat, and gloves to keep warm in the frigid temperatures. His eyes, piercing and resolute, reflect the strength and resolve needed to conquer the elements and the challenging terrain.

paintings, sketches, worst quality, low quality, normal quality, lowres, blurry, text, logo, monochrome, grayscale, skin spots, acnes, skin blemishes, age spot, strabismus, wrong finger, bad anatomy, bad hands, error, missing fingers, cropped, jpeg artifacts, signature, watermark, username, dark skin, fused girls, fushion, bad feet, ugly, pregnant, vore, duplicate, morbid, mutilated, transexual, hermaphrodite, long neck, mutated hands, poorly drawn face, mutation, deformed, bad proportions, malformed limbs, extra limbs, cloned face, disfigured, gross proportions, missing arms, missing legs, extra arms, extra legs, plump, open mouth, tooth, teeth, nsfw,

sdxl

xformers 用contorlnet sdxl

xformers:

./webui.sh --enable-insecure-extension-access --skip-python-version-check --skip-torch-cuda-test  --listen --port 7860 --no-download-sd-model --api --no-half-vae --xformers

速度：

Time taken: 11.5 sec.

A: 13.29 GB, R: 16.77 GB, Sys: 18.5/39.3945 GB (47.0%)

sdpa（–opt-sdp-no-mem-attention）用contorlnet sdxl

sdpa

./webui.sh --enable-insecure-extension-access --skip-python-version-check --skip-torch-cuda-test  --listen --port 7860 --no-download-sd-model --api --no-half-vae --opt-sdp-no-mem-attention

Time taken: 11.1 sec.

A: 13.29 GB, R: 14.81 GB, Sys: 16.6/39.3945 GB (42.1%)

sdpa(–opt-sdp-attention) 用contorlnet sdxl

sdpa

./webui.sh --enable-insecure-extension-access --skip-python-version-check --skip-torch-cuda-test  --listen --port 7860 --no-download-sd-model --api --no-half-vae --opt-sdp-attention

Time taken: 11.4 sec.

A: 13.29 GB, R: 14.81 GB, Sys: 16.6/39.3945 GB (42.1%)