Stable diffusion 3.5本地运行环境配置记录

1.环境配置

创建虚环境
```
conda create -n sd3.5 python=3.10
```

Pytorch(>2.0)

conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia

Jupyter能使用Anaconda虚环境

conda install ipykernel
python -m ipykernel install --user --name sd3.5 --display-name "SD3.5"

安装transformer和tokenizer

pip install transformers==4.38.2 
pip install tokenizers==0.15.2

安装最新版本的diffuser
```
pip install -U diffusers
```
安装量化库节约VRAM GPUs
```
pip install bitsandbytes
```
安装sentencepiece
```
pip install sentencepiece
```

根据项目需要安装其他库

pip install matplotlib
pip install numpy==1.26.4    # 降级，否则有些时候会报错
pip install accelerate
pip install protobuf==3.19.0

2.报错解决

如果报错：Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 960 column 3
降级解决

pip install transformers==4.38.2
pip install tokenizers==0.15.2

如果报错ValueError: Cannot instantiate this tokenizer from a slow version. If it’s based on sentencepiece, make sure you have sentencepiece installed.
安装sentencepiece：

pip install sentencepiece

如果报错：ValueError: The current PyTorch version does not support the scaled_dot_product_attention function.
解决：安装高于Pytorch>2.0

如果报错：T5Converter requires the protobuf library but it was not found in your environment. Checkout the instructions on the

解决：

pip install protobuf==3.19.0

3.实测

官网例子：

local_path = "/home/aic/diffusion_models/stable-diffusion-3.5-large/"
pipe = StableDiffusion3Pipeline.from_pretrained(local_path, torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
    "A capybara holding a sign that reads Hello World",
    num_inference_steps=28,
    guidance_scale=3.5,
).images[0]
image.save("capybara.png")

自定义例子

“一名古代风格的中国女学生坐在现代的计算机教室里面学习编程”

prompts:“An ancient-style Chinese female student sitting in a modern computer classroom learning programming, focused eyes, traditional Hanfu attire, modern technology, code editor, keyboard, mouse, fusion of digital age and traditional aesthetics, rich in detail, high-definition quality.”

prompts="An ancient-style Chinese female student sitting in a modern computer classroom learning programming, focused eyes, traditional Hanfu attire, modern technology, code editor, keyboard, mouse, fusion of digital age and traditional aesthetics, rich in detail, high-definition quality."
image = pipe(
    prompt=prompt,
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]

image.save("girls.png")
plt.imshow(plt.imread("girls.png"))
plt.axis('off')  # 不显示坐标轴
display(plt.gcf())