大模型开发实战篇5：多模态--文生图模型API

大模型文生图是一种基于人工智能大模型的技术，能够将自然语言文本描述转化为对应的图像。目前非常火的AI大模型赛道，有很多公司在此赛道竞争。详情可看这篇文章。

今天我们来看下如何调用WebAPI来实现文生图功能。我们一般都会将OpenAI的接口，因为OpenAI是标杆，其他大模型都以它为参考，并且很多大模型的接口都复刻了OpenAI的接口，不管是接口形式还是参数定义基本都一样；也就是只要学会了OpenAI的接口，很多其他大模型也就会调用了。

一、OpenAI的文生图模型 DALL·E

OpenAI Images API 提供了三种与图像交互的方法：

基于文本提示生成图像（DALL·E 3 和 DALL·E 2）
通过模型编辑（替换）已存在图像的某些区域，根据新的文本提示创建编辑过的图像版本（仅限 DALL·E 2）
创建现有图像的变体（仅限 DALL·E 2）

本文主要介绍第一种文生图像的使用方法。

关于 DALL·E 3 模型更新的更多内容，请参考 OpenAI Cookbook的官方链接：https://cookbook.openai.com/articles/what_is_new_with_dalle_3

图像生成 API参数

prompt：（必须传）提示词，一段对所需图像的文字描述。对于 DALL・E-2，最大长度为 1000 个字符；对于 DALL・E-3，最大长度为 4000 个字符。
model（'dall-e-2' 或 'dall-e-3'）：您正在使用的模型。请注意将其设置为 'dall-e-3'，因为如果为空，默认为 'dall-e-2'。
style（'natural' 或 'vivid'）：生成图像的风格。必须是 'vivid' 或 'natural' 之一。'vivid' 会使模型倾向于生成超现实和戏剧性的图像。'natural' 会使模型产生更自然、不那么超现实的图像。默认为 'vivid'。
quality（'standard' 或 'hd'）：将生成的图像质量。'hd' 创建细节更精细、整体一致性更高的图像。默认为 'standard'。
n（int）：要生成的图像数量。必须在1到10之间。默认为1。对于 dall-e-3，只支持 n=1。
size（...）：生成图像的尺寸。对于 DALL·E-2 模型，必须是 256x256、512x512 或 1024x1024 之一。对于 DALL·E-3 模型，必须是 1024x1024、1792x1024 或 1024x1792 之一。
response_format（'url' 或 'b64_json'）：返回生成图像的格式。必须是 "url" 或 "b64_json" 之一。默认为 "url"。url形式的图片地址是有效期限制的，一般是2个小时后过期，所以在代码开发中若需要长期保留图片，需要将图片及时保存到本地。
user（str）：代表您的终端用户的唯一标识符，将帮助 OpenAI 监控和检测滥用。

了解更多可取官方api教程：https://platform.openai.com/docs/api-reference/images/create

代码演示

1、标准模型（quality="standard"）

from openai import OpenAI
client = OpenAI(api_key="sk-xxx", base_url="https://vip.apiyi.com/v1")

response = client.images.generate(
    model="dall-e-3",
    prompt="In the style of a Polaroid photo, a cute Japanese high school girl, dressed in her school uniform, with short black hair, is smiling and posing at the entrance of her high school. The soft, warm tones of the Polaroid film capture the gentle morning light, highlighting the neat pleats of her skirt and the crisp white collar of her blouse. Her cheerful expression and bright eyes reflect the excitement of a new day, and the background shows the familiar brick walls and arched entryway of the school, with a few cherry blossom petals scattered on the ground, adding a touch of seasonal beauty.",
    size="1024x1024",
    quality="standard",
    n=1,
)

image_url = response.data[0].url
print(image_url)

可以看到，模型为dall-e-3, quality的参数值为standard标准版本，尺寸1024*1024 。

返回结果：

https://dalleprodsec.blob.core.windows.net/private/images/533473e0-f83d-4cde-980f-c392386ed3c9/generated_00.png?se=2025-02-17T14%3A09%3A29Z&sig=%2FoN77%2Bkb%2FKA4OY84bNaQ7N%2BZ1ZSVC2js1%2Ff9UoD6W4s%3D&ske=2025-02-22T15%3A19%3A26Z&skoid=e52d5ed7-0657-4f62-bc12-7e5dbb260a96&sks=b&skt=2025-02-15T15%3A19%3A26Z&sktid=33e01921-4d64-4f8c-a055-5bdaffd5e33d&skv=2020-10-02&sp=r&spr=https&sr=b&sv=2020-10-02

2、高清模式（quality="hd")

response = client.images.generate(
    model="dall-e-3",
    prompt="a white siamese cat",
    size="1024x1024",
    quality="hd",
    n=1,
)

print(response.data[0].url)

3、自然风格(style="natural")

response = client.images.generate(
    model="dall-e-3",
    prompt="a white siamese cat",
    size="1024x1024",
    quality="standard",
    n=1,
    style="natural"
)
print(response.data[0].url)

4、戏剧风格(style="vivid")

response = client.images.generate(
    model="dall-e-3",
    prompt="a white siamese cat",
    size="1024x1024",
    quality="standard",
    n=1,
    style="vivid"
)
print(response.data[0].url)

二、智普图像生成模型

cogview-4适用于图像生成任务，通过对用户文字描述快速、精准的理解，让AI的图像表达更加精确和个性化。

模型编码：cogview-4最新、cogview-3-flash ；CogView-4模型：0.06 元 / 次；
可以去体验一下效果：https://www.bigmodel.cn/trialcenter/modeltrial?modelCode=cogview-4

请求参数

参数名	类型	必填	描述
model	String	是	模型编码
prompt	String	是	所需图像的文本描述
size	String	否	图片尺寸，可选值：1024x1024,768x1344,864x1152,1344x768,1152x864,1440x720,720x1440，默认是1024x1024。
user_id	String	否	终端用户的唯一ID，协助平台对终端用户的违规行为、生成违法及不良信息或其他滥用行为进行干预。ID长度要求：最少6个字符，最多128个字符。

响应参数

参数名称	类型	参数说明
created	String	请求创建时间，是以秒为单位的Unix时间戳。
data	List	数组，包含生成的图片 URL。目前数组中只包含一张图片。
url	String	图片链接。图片的临时链接有效期为 30天，请及时转存图片。
content_filter	List	返回内容安全的相关信息。
role	String	安全生效环节，包括 role = assistant 模型推理，role = user 用户输入，role = history 历史上下文
level	Integer	严重程度 level 0-3，level 0表示最严重，3表示轻微

调用示例

from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # 请填写您自己的APIKey
  
response = client.images.generations(
    model="cogview-4", #填写需要调用的模型编码
    prompt="一只可爱的小猫咪",
)

print(response.data[0].url)

响应示例

{
  "created": 1703485556,
  "data": [
      {
          "url": "https://......"
      }
  ]
}

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：/a/970878.html

如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈qq邮箱809451989@qq.com，一经查实，立即删除！