训练和推理过程中diffusion model加噪和去噪的范围应该为[-1 , 1],而作为condition的rep范围应该为[0 , 1]
训练:
加载datasets:
transform_train = transforms.Compose([
transforms.ToTensor()])
dataset_train = datasets.ImageFolder(os.path.join(args.data_path, 'train'), transform=transform_train)
将images的范围放置在[-1, 1]:
# image to [-1, 1] to be compatible with LDM
images = images * 2 - 1
image encoder之后的rep通过归一化,将原来images[-1,1]置为[0,1]的范围作为condition放入diffusion:
if self.rep_cond:
with torch.no_grad():
mean = torch.Tensor([0.485, 0.456, 0.406]).cuda().unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
std = torch.Tensor([0.229, 0.224, 0.225]).cuda().unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
# 将x_normalized scale to [0, 1],由于x的范围是 [-1, 1]
x_normalized = (x + 1) / 2
x_normalized = (x_normalized - mean) / std
x_normalized = torch.nn.functional.interpolate(x_normalized, 224, mode='bicubic', align_corners=False)
image执行diffusion model的训练过程:
# [-1,1]
encoder_posterior = self.encode_first_stage(x)
推理:
model生成后:
# (bz,3,256,256) --> min:<-1 , max:>1
gen_imgs = model.decode_first_stage(sampled_latents)
# [-1,1]
gen_imgs = torch.clamp(gen_imgs, -1., 1.)
# [0,1]
gen_images_batch = (gen_imgs + 1.0) / 2
gen_images_batch = gen_images_batch.detach().cpu()
# save img
for b_id in range(gen_images_batch.size(0)):
# [0, 255]
gen_img = np.clip(gen_images_batch[b_id].numpy().transpose([1, 2, 0]) * 255, 0, 255)
gen_img = gen_img.astype(np.uint8)[:, :, ::-1]
cv2.imwrite('test.png', gen_img)