题意:通过LangChain使用大型语言模型(LLM)处理电子邮件
问题背景:
I am quite new to LangChain and Python as im mainly doing C# but i am interested in using AI on my own data. So i wrote some python code using langchain that:
我对于LangChain和Python比较新,因为我主要做C#,但我对在自己的数据上使用AI很感兴趣。因此,我使用LangChain编写了一些Python代码,这些代码可以
1. Gets my Emails via IMAP 通过IMAP获取我的电子邮件
2. Creates JSON from my E-Mails (JSONLoader)
从我的电子邮件中创建JSON(JSONLoader)
3. Creates a Vectordatabase where each mail is a vector (FAISS, OpenAIEmbeddings)
创建一个向量数据库,其中每封邮件都是一个向量(使用FAISS和OpenAI嵌入)
4. Does a similarity search according to the query returning the 3 mails that match the query the most
根据查询进行相似度搜索,返回与查询最匹配的3封邮件
5. feeds the result of the similarity search to the LLM (GPT 3.5 Turbo) using the query AGAIN
再次使用查询将相似度搜索的结果提供给LLM(GPT 3.5 Turbo)
The LLM Prompt then looks something like: 然后LLM的提示(Prompt)看起来像这样:
The question is
{query}
Here are some information that can help you to answer the question:
{similarity_search_result}
Ok so far so good... when my question is: 好的,目前为止还不错……当我的问题是:
When was my last mail sent to xyz@gmail.com?
我最后一次给mailto:xyz@gmail.com发送邮件是什么时候?
i get a correct answer... -> e.g last mail received 10.04.2024 14:11
我得到了一个正确的答案……例如,最后一封邮件的接收时间是2024年10月4日14:11。
But what if i want to have an answer to the following question
但如果我想得到以下问题的答案呢?
How many mails have been sent by xyz@gmail.com?
mailto:xyz@gmail.com发送了多少封邮件?即 xyz@gmail.com 接收了多少邮件?
Because the similarity search only gets the vectors that are most similar, how can i just get an answer about the amount? Even if the similarity search would deliver 150 mails instead of 3 sent by xyz@gmail.com i cant just feed them all into the LLM prompt right?
因为相似度搜索只获取最相似的向量,我如何只得到数量的答案呢?即使相似度搜索返回了由xyz@gmail.com发送的150封邮件而不是3封,我也不能直接将它们全部输入到LLM的提示中,对吧?
So what is my mistake here? 那么我在这里的错误是什么?
问题解决:
It sounds like you need what OpenAI calls "function calling" / tools. RAG is great for grabbing relevant documents to dump into the context window, but as you've seen it's not suitable for everything. Thankfully, we can add arbitrary capabilities without implementing our own hacky solution using function calling. You first implement a function that does what you want in python. When you query OpenAI, you provide a description of these functions (tools). The chat completions API can reason about your request, then respond with JSON containing arguments for you to pass to the function you defined. This allows llms to hypothetically take any actions a human would.
听起来你需要的是OpenAI所说的“函数调用”/工具。RAG(Retrieval Augmented Generation,检索增强生成)在抓取相关文档并放入上下文窗口中非常出色,但正如你所看到的,它并不适合所有情况。幸运的是,我们可以使用函数调用来添加任意功能,而无需实现我们自己的粗暴解决方案。你首先在Python中实现一个你想要的函数。当你向OpenAI发出查询时,你提供这些函数(工具)的描述。聊天补全API可以根据你的请求进行推理,然后返回包含参数的JSON,这些参数需要你传递给你定义的函数。这允许LLMs(大型语言模型)理论上执行人类可以执行的任何操作。
So, for your case of getting the number of emails by email address, you'd want to implement a function in python that perhaps queries for the number of emails for a given user via IMAP. I'll leave that task to you, but once you complete that, the below should serve as a working minimal example to build off of.
因此,对于你想要通过电子邮件地址获取邮件数量的情况,你需要在Python中实现一个函数,该函数可能通过IMAP查询给定用户的邮件数量。我将把这个任务留给你,但一旦你完成了这个功能,下面的内容应该可以作为一个可以构建的最小工作示例。
import json
from openai import OpenAI
client = OpenAI(api_key='YOUR API KEY')
tools = [
{
"type": "function",
"function": {
"name": "total_number_of_emails",
"description": "Get the number of emails in an email user's inbox",
"parameters": {
"type": "object",
"properties": {
"email_address": {
"type": "string",
"description": "The user's email address",
},
},
"required": ["email_address"],
},
},
},
]
def total_number_of_emails(email_address):
return 42 # replace with real code to grab # of emails
def test(query):
cpl = client.chat.completions.create(
model='gpt-3.5-turbo',
messages=[{'role': 'user', 'content': query}],
tools=tools,
tool_choice='auto' # lets model decide whether to use a tool
)
for tool_call in cpl.choices[0].message.tool_calls:
fn = tool_call.function
if fn.name == 'total_number_of_emails':
args = json.loads(fn.arguments)
print(total_number_of_emails(args['email_address']))
test('How many mails have been sent by xyz@gmail.com?')
If you simply copy and paste the above code unmodified, add your api key, and execute it, it should print "42" every time.
如果你直接复制和粘贴上面的代码而不做任何修改,然后添加你的API密钥并执行它,它应该每次都打印出“42”。