AI数字人对话之RealChar框架源码解读

零.功能介绍

与虚拟角色（非形象）进行文本或语音会话

体验地址：RealChar.
代码库：GitHub - Shaunwei/RealChar: 🎙️🤖Create, Customize and Talk to your AI Character/Companion in Realtime (All in One Codebase!). Have a natural seamless conversation with AI everywhere (mobile, web and terminal) using LLM OpenAI GPT3.5/4, Anthropic Claude2, Chroma Vector DB, Whisper Speech2Text, ElevenLabs Text2Speech🎙️🤖

一.整体架构

二.技术选型

✅Web: Vanilla JS, WebSockets
✅Mobile: Swift, WebSockets
✅Backend: FastAPI, SQLite, Docker
✅Data Ingestion: LlamaIndex, Chroma
✅LLM Orchestration: LangChain, Chroma
✅LLM: OpenAI GPT3.5/4, Anthropic Claude 2
✅Speech to Text: Local Whisper, OpenAI Whisper API, Google Speech to Text
✅Text to Speech: ElevenLabs
✅Voice Clone: ElevenLabs

三.安装方法

Step 1. 拉取代码库

git clone https://github.com/Shaunwei/RealChar.git && cd RealChar

Step 2. 安装依赖

安装音频处理库portaudio和ffmpeg

# for ubuntu
sudo apt update
sudo apt install portaudio19-dev
sudo apt install ffmpeg

# for mac
brew install portaudio
brew install ffmpeg

安装其他python依赖库

pip install -r requirements.txt

Step 3. 第一次使用时创建空的sqlite数据库

sqlite3 test.db "VACUUM;"

Step 4. 升级db

alembic upgrade head

Step 5. 配置.env: 更新API Key及相关信息

cp .env.example .env

特别说明：由于用到了webrtc，要求使用https，故本地调试时需要安装证书。方法如下：

1.安装mkcert

brew install mkcert

brew install nss

nss 是可选的，如果不使用或者不需要测试 Firefox，那么可以不安装 nss。

2.生成证书，并加入系统信任

mkdir -p ~/.cert

mkcert -key-file ~/.cert/key.pem -cert-file ~/.cert/cert.pem "localhost"

mkcert -install

3.修改服务，加入证书

@click.command(context_settings={"ignore_unknown_options": True})

@click.argument('args', nargs=-1, type=click.UNPROCESSED)

def run_uvicorn(args):

click.secho("Running uvicorn server...", fg='green')

subprocess.run(["uvicorn", "realtime_ai_character.main:app",

"--host", "localhost", "--ws-ping-interval", "60", "--ws-ping-timeout", "60", "--timeout-keep-alive", "60", "--ssl-keyfile", "/Users/chenxiangli/.cert/key.pem", "--ssl-certfile", "/Users/chenxiangli/.cert/cert.pem"] +list(args))

Step 6. 通过cli.py或uvicorn启动服务

python cli.py run-uvicorn
# or
uvicorn realtime_ai_character.main:app

Step 7. 启动客户端:
- 为了更好地会话内容，建议使用GPT4，建议使用耳机以避免回声
- 启动浏览器打开网址：https://localhost:8000
Step 8. 选择一个角色进行对话

四.角色制作

1.角色记忆库与特征语音生成

角色特征分成两个部分：角色的记忆库文件和角色的语音。其中最重要的部分是角色的记忆库文件，它是在GPT的帮助下生成的。要创建一个角色，首先我们需要检查GPT的记忆库是否包含有关该角色的信息。如果有，整个过程将会容易得多。

（1）角色记忆库创建

我们可以包含许多不同类型的文件，但最重要的是一个CSV文件。

以下两个部分描述了创建洛基记忆文件的过程：

1.如何生成50个洛基演讲示例：
    提示词：Help me find 50 examples of how Loki from Marvel Cinematic Universe talks. I want to generate a format like "context", "quote". It needs to be in csv format. The quote should only include Loki's words, represent Loki's personality and how he talks, and be an actual quote from a movie or novel, not made up. The context should be unique.
    参考文件：https://chat.openai.com/share/73f29bc2-bbfe-43f1-a3a7-a3ce82a299c3
    格式确保有三列，以下是具体的格式参考：https://github.com/Shaunwei/Realtime-AI-Character/blob/main/realtime_ai_character/chacater_catalog/marvel_loki/data/talk.csv
2.如何生成描述洛基背景的文件：
    提示词1： Do you know Loki from Marvel movies?
    提示词2： Write me a simple system prompt for a new version of you to be Loki the character, and the new version of you can speak and sound like Loki. Tell it as first person. Here is a previous example for a character.
    提示词3： Refine and simplify it into 100 words.
    参考提示文件：https://chat.openai.com/share/a2a213c7-cb1e-441e-a651-129333fefb72

（2）角色特征语音合成（基于ElevenLabs）

a.收集数据

在开始之前，您需要语音数据。下载高质量的纯人声音频剪辑。training_data文件夹以供参考。

如果您要创建自己的数据集，请确保音频是高质量的。应该没有背景噪音，发音清晰
音频格式必须为 mp3，总长度约为 1 分钟

b.创建 ElevenLabs 账户

访问 ElevenLabs 创建账户。您需要它来访问语音合成和语音克隆功能。

获取您的 ELEVEN_LABS_API_KEY ：

单击个人资料图标并选择“个人资料”。
复制 API 密钥

c.语音合成/语音克隆

请按照以下步骤克隆语音：

进入语音合成页面。
单击“Add Voice”。
单击“Add Generative or Cloned Voice”。
单击“Instant Voice Cloning”。
填写所有必需的信息并上传您的音频样本。
单击“Add Voice”。

d.测试你的声音

要测试您刚刚创建的声音：

返回语音合成页面。
选择您刚刚在“设置”中创建的声音。
输入一些文本并单击“Generate”。

e.微调你的声音

您可以通过调整系统和用户提示来使语音朗读效果更好。以下是一些提示：

如果声音太单调，请降低稳定性以使其更加情绪化。然而，将稳定性设置为零有时会导致奇怪的口音。
较长的句子往往会说得更好，因为它们为人工智能扬声器提供了更多可以理解的上下文。
对于说得太快的较短句子，请替换“。”和 ”...”。添加“-”或换行符以暂停。
添加与情感相关的单词或短语，或使用标点符号，如“！”、“？”为声音添加情感。

f.在我们的项目中使用您的自定义声音

您需要克隆语音的语音ID。就是这样：

转到 Text To Speech - ElevenLabs
选择获取语音 api
按照说明操作并在 Responses 中找到特定的 voice_id。
不要忘记使用 ELEVEN_LABS_API_KEY 和语音 ID 更新您的 .env 文件。

2.角色生成

（1）角色信息目录结构

character_catalog
├── ai_character_helper
│   ├── data
│   │   ├── background
│   │   ├── xxx.md
│   ├── system
│   └── user
├── loki
...

每个文件夹是一个AI角色，如：loki

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：/a/548622.html

如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈qq邮箱809451989@qq.com，一经查实，立即删除！