LangChain 0.2 - 对话式RAG

文章目录

- 一、项目说明
- 二、设置
- - 1、引入依赖
  - 2、LangSmith
- 三、Chains
- - 1、添加聊天记录
  - - Contextualizing the question
    - 聊天记录状态管理
  - 2、合并
- 四、Agents
- - 1、检索工具
  - 2、代理建造者
  - 3、合并
- 五、下一步

本文翻译整理自：Conversational RAG
https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/

一、项目说明

在许多问答应用程序中，我们希望允许用户进行来回对话，这意味着应用程序需要对过去的问题和答案进行某种“记忆”，以及将这些内容合并到当前思维中的一些逻辑。

在本指南中，我们重点介绍**如何添加合并历史消息的逻辑。**有关聊天历史管理的更多详细信息请参见此处。

我们将介绍两种方法：

链，我们总是在其中执行检索步骤；
代理，我们让法学硕士自行决定是否以及如何执行检索步骤（或多个步骤）。

对于外部知识源，我们将使用RAG 教程中由 Lilian Weng 撰写的相同LLM Powered Autonomous Agents博客文章。

二、设置

1、引入依赖

在本演练中，我们将使用 OpenAI 嵌入和 Chroma 矢量存储，但此处显示的所有内容都适用于任何Embeddings、VectorStore或Retriever。

我们将使用以下包：

%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-chroma bs4

我们需要设置环境变量OPENAI_API_KEY，可以直接完成或从.env文件加载，如下所示：

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

# import dotenv

# dotenv.load_dotenv()

2、LangSmith

您使用 LangChain 构建的许多应用程序将包含多个步骤以及多次调用 LLM 调用。随着这些应用程序变得越来越复杂，能够检查链或代理内部到底发生了什么变得至关重要。最好的方法是使用LangSmith。

请注意，LangSmith 不是必需的，但它很有帮助。如果您确实想使用 LangSmith，请在上面的链接注册后，确保设置环境变量以开始记录跟踪：

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

三、Chains

让我们首先重新审视一下我们在 RAG 教程中根据 Lilian Weng 撰写的 LLM Powered Autonomous Agents博客文章构建的问答应用程序。

pip install -qU langchain-openai

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

import bs4
from langchain import hub
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 1. Load, chunk and index the contents of the blog to create a retriever.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

# 2. Incorporate the retriever into a question-answering chain.
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

response = rag_chain.invoke({"input": "What is Task Decomposition?"})
response["answer"]

'Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable. This process can be achieved through techniques like Chain of Thought (CoT) or Tree of Thoughts, which help agents plan and execute tasks effectively by dividing them into sequential subgoals. Task decomposition can be facilitated by using prompting techniques, task-specific instructions, or human inputs to guide the agent through the steps required to accomplish a task.'

请注意，我们使用了内置的链构造函数create_stuff_documents_chain和create_retrieval_chain，因此我们的解决方案的基本要素是：

retriever;
prompt;
LLM.

这将简化合并聊天历史记录的过程。

1、添加聊天记录

我们构建的链直接使用输入查询来检索相关上下文。但在对话设置中，用户查询可能需要理解对话上下文。例如，考虑以下交换：

人：“什么是任务分解？”

AI：“任务分解涉及将复杂的任务分解为更小、更简单的步骤，使代理或模型更易于管理。”

人类：“常见的做法是什么？”

为了回答第二个问题，我们的系统需要理解“它”指的是“任务分解”。

我们需要更新现有应用程序的两件事：

提示：更新我们的提示以支持历史消息作为输入。
情境化问题：添加一个子链，该子链接收最新的用户问题，并在聊天历史的背景下重新表述它。这可以简单地看作是构建一个新的“历史感知”检索器。而之前我们有：
- query->retriever
  现在我们将有：
- (query, conversation history)-> LLM-> rephrased query->retriever

Contextualizing the question

首先，我们需要定义一个子链，该子链接收历史消息和最新的用户问题，并且如果它引用了历史信息中的任何信息，则重新制定该问题。

我们将使用包含名为“chat_history”的变量的提示MessagesPlaceholder。这允许我们使用“chat_history”输入键将消息列表传递给提示，这些消息将插入到系统消息之后和包含最新问题的人工消息之前。

请注意，我们利用辅助函数create_history_aware_retriever来完成这一步，该函数管理为空的情况chat_history，否则按顺序应用 prompt | llm | StrOutputParser() | retriever。

create_history_aware_retriever构造一个接受键input和chat_history作为输入的链，并具有与检索器相同的输出模式。

from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

API参考：create_history_aware_retriever | MessagesPlaceholder

该链将输入查询的改写添加到我们的检索器中，以便检索包含对话的上下文。

现在我们可以构建完整的 QA 链。这就像将检索器更新为我们的新检索器一样简单history_aware_retriever。

再次，我们将使用create_stuff_documents_chain来生成一个question_answer_chain，其中输入键为context，，chat_history和input——它接受检索到的上下文以及对话历史记录和查询来生成答案。

我们用 create_retrieval_chain构建最终 rag_chain。此链按顺序应用history_aware_retriever和question_answer_chain，保留中间输出（例如检索到的上下文）以方便使用。它具有输入键input和chat_history，并且在其输出中包含input，chat_history、context 和answer 。

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

API参考：create_retrieval_chain | create_stuff_documents_chain

让我们尝试一下。下面我们提出一个问题和一个后续问题，这些问题需要语境化才能返回合理的答案。

因为我们的链包含一个"chat_history"输入，所以调用者需要管理聊天记录。我们可以通过将输入和输出消息附加到列表中来实现这一点：

from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

question = "What is Task Decomposition?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

second_question = "What are common ways of doing it?"
ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

print(ai_msg_2["answer"])

API Reference:AIMessage | HumanMessage

Task decomposition can be done in several common ways, such as using Language Model (LLM) with simple prompting like "Steps for XYZ" or asking for subgoals to achieve a specific task. Task-specific instructions can also be provided, like requesting a story outline for writing a novel. Additionally, human inputs can be utilized to decompose tasks into smaller components effectively.

查看LangSmith 跟踪

聊天记录状态管理

这里我们介绍了如何添加应用程序逻辑来整合历史输出，但我们仍需要手动更新聊天记录并将其插入到每个输入中。在真正的问答应用程序中，我们需要某种方式来保存聊天记录，以及某种方式来自动插入和更新聊天记录。

为此我们可以使用：

BaseChatMessageHistory：存储聊天记录。
RunnableWithMessageHistory：LCEL 链的包装器，BaseChatMessageHistory用于将聊天历史记录注入输入并在每次调用后更新它。

有关如何一起使用这些类来创建有状态对话链的详细演练，请转到如何添加消息历史记录（内存） LCEL 页面。

下面我们实现第二种方案的简单示例，聊天记录存储在一个简单的字典中。LangChain 管理与Redis和其他技术的内存集成，以提供更强大的持久性。

实例RunnableWithMessageHistory为您管理聊天历史记录。它们接受带有键的配置（"session_id"默认情况下），该键指定要获取并添加到输入中的对话历史记录，并将输出附加到相同的对话历史记录。以下是一个例子：

from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

API Reference:ChatMessageHistory | BaseChatMessageHistory | RunnableWithMessageHistory

conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

'Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable for an agent or model. This process helps in guiding the agent through the various subgoals required to achieve the overall task efficiently. Different techniques like Chain of Thought and Tree of Thoughts can be used to decompose tasks into manageable components.'

conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]

'Task decomposition can be achieved through various methods such as using prompting techniques like "Steps for XYZ" to guide the model through subgoals, providing task-specific instructions like "Write a story outline" for specific tasks, or incorporating human inputs to break down complex tasks. These approaches help in dividing a large task into smaller, more manageable components for better understanding and execution.'

对话历史记录可以在store字典中检查：

for message in store["abc123"].messages:
    if isinstance(message, AIMessage):
        prefix = "AI"
    else:
        prefix = "User"

    print(f"{prefix}: {message.content}\n")

User: What is Task Decomposition?

AI: Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable for an agent or model. This process helps in guiding the agent through the various subgoals required to achieve the overall task efficiently. Different techniques like Chain of Thought and Tree of Thoughts can be used to decompose tasks into manageable components.

User: What are common ways of doing it?

AI: Task decomposition can be achieved through various methods such as using prompting techniques like "Steps for XYZ" to guide the model through subgoals, providing task-specific instructions like "Write a story outline" for specific tasks, or incorporating human inputs to break down complex tasks. These approaches help in dividing a large task into smaller, more manageable components for better understanding and execution.

2、合并

为了方便起见，我们将所有必要的步骤整合到一个代码单元中：

import bs4
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

***
### Construct retriever ###
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

***
### Contextualize question ###
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

***
### Answer question ###
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

***
### Statefully manage chat history ###
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

'Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable. This process helps agents or models tackle difficult tasks by dividing them into more easily achievable subgoals. Task decomposition can be done through techniques like Chain of Thought or Tree of Thoughts, which guide the model in thinking step by step or exploring multiple reasoning possibilities at each step.'

conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]

"Common ways of task decomposition include using techniques like Chain of Thought (CoT) or Tree of Thoughts to guide models in breaking down complex tasks into smaller steps. This can be achieved through simple prompting with LLMs, task-specific instructions, or human inputs to help the model understand and navigate the task effectively. Task decomposition aims to enhance model performance on complex tasks by utilizing more test-time computation and shedding light on the model's thinking process."

四、Agents

代理利用 LLM 的推理能力在执行过程中做出决策。使用代理可以让您在检索过程中减轻一些判断力。虽然它们的行为比链更难预测，但它们在这方面具有一些优势：

代理直接为检索器生成输入，而不一定需要我们明确地构建语境化，就像我们上面所做的那样；
代理可以执行多个检索步骤来处理查询，或者完全不执行检索步骤（例如，响应用户的一般问候）。

1、检索工具

代理可以访问“工具”并管理其执行。在本例中，我们将把检索器转换为 LangChain 工具，供代理使用：

from langchain.tools.retriever import create_retriever_tool

tool = create_retriever_tool(
    retriever,
    "blog_post_retriever",
    "Searches and returns excerpts from the Autonomous Agents blog post.",
)
tools = [tool]

API Reference:create_retriever_tool

工具是 LangChain Runnables，并实现通常的接口：

tool.invoke("task decomposition")

'Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\nFig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\n\n(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user\'s request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\n\nFig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\nThe system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\nInstruction:'

2、代理建造者

现在我们已经定义了工具和 LLM，我们可以创建代理。我们将使用LangGraph来构建代理。目前我们使用高级接口来构建代理，但 LangGraph 的好处在于，这个高级接口由低级、高度可控的 API 支持，以防您想要修改代理逻辑。

from langgraph.prebuilt import chat_agent_executor

agent_executor = chat_agent_executor.create_tool_calling_executor(llm, tools)

我们现在可以尝试一下。请注意，到目前为止它还不是有状态的（我们仍然需要添加内存）

query = "What is Task Decomposition?"

for s in agent_executor.stream(
    {"messages": [HumanMessage(content=query)]},
):
    print(s)
    print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_demTlnha4vYA1IH6CByYupBQ', 'function': {'arguments': '{"query":"Task Decomposition"}', 'name': 'blog_post_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 68, 'total_tokens': 87}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d1c3f3da-be18-46a5-b3a8-4621ba1f7f2a-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'Task Decomposition'}, 'id': 'call_demTlnha4vYA1IH6CByYupBQ'}])]}}
----
{'action': {'messages': [ToolMessage(content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\n\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\n(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user\'s request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\n\nFig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\nThe system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\nInstruction:', name='blog_post_retriever', id='e83e4002-33d2-46ff-82f4-fddb3035fb6a', tool_call_id='call_demTlnha4vYA1IH6CByYupBQ')]}}
----
{'agent': {'messages': [AIMessage(content='Task decomposition is a technique used in autonomous agent systems to break down complex tasks into smaller and simpler steps. This approach helps agents better understand and plan for the various steps involved in completing a task. One common method for task decomposition is the Chain of Thought (CoT) technique, where models are prompted to "think step by step" to decompose hard tasks into manageable steps. Another approach, known as Tree of Thoughts, extends CoT by exploring multiple reasoning possibilities at each step and creating a tree structure of tasks.\n\nTask decomposition can be achieved through various methods, such as using simple prompts for language models, task-specific instructions, or human inputs. By breaking down tasks into smaller components, agents can effectively plan and execute tasks with greater efficiency.\n\nIn summary, task decomposition is a valuable strategy for autonomous agents to tackle complex tasks by breaking them down into smaller, more manageable steps.', response_metadata={'token_usage': {'completion_tokens': 177, 'prompt_tokens': 588, 'total_tokens': 765}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-808f32b9-ae61-4f31-a55a-f30643594282-0')]}}
----

LangGraph 具有内置持久性，因此我们不需要使用 ChatMessageHistory！相反，我们可以直接将检查点传递给我们的 LangGraph 代理

from langgraph.checkpoint.sqlite import SqliteSaver

memory = SqliteSaver.from_conn_string(":memory:")

agent_executor = chat_agent_executor.create_tool_calling_executor(
    llm, tools, checkpointer=memory
)

这就是我们构建对话 RAG 代理所需要的全部内容。

让我们观察一下它的行为。请注意，如果我们输入不需要检索步骤的查询，则代理不会执行该查询：

config = {"configurable": {"thread_id": "abc123"}}

for s in agent_executor.stream(
    {"messages": [HumanMessage(content="Hi! I'm bob")]}, config=config
):
    print(s)
    print("----")

{'agent': {'messages': [AIMessage(content='Hello Bob! How can I assist you today?', response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 67, 'total_tokens': 78}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-1451e59b-b135-4776-985d-4759338ffee5-0')]}}
----

此外，如果我们输入确实需要检索步骤的查询，代理会生成工具的输入：

query = "What is Task Decomposition?"

for s in agent_executor.stream(
    {"messages": [HumanMessage(content=query)]}, config=config
):
    print(s)
    print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_ab2x4iUPSWDAHS5txL7PspSK', 'function': {'arguments': '{"query":"Task Decomposition"}', 'name': 'blog_post_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 91, 'total_tokens': 110}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-f76b5813-b41c-4d0d-9ed2-667b988d885e-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'Task Decomposition'}, 'id': 'call_ab2x4iUPSWDAHS5txL7PspSK'}])]}}
----
{'action': {'messages': [ToolMessage(content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\n\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\n(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user\'s request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\n\nFig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\nThe system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\nInstruction:', name='blog_post_retriever', id='e0895fa5-5d41-4be0-98db-10a83d42fc2f', tool_call_id='call_ab2x4iUPSWDAHS5txL7PspSK')]}}
----
{'agent': {'messages': [AIMessage(content='Task decomposition is a technique used in complex tasks where the task is broken down into smaller and simpler steps. This approach helps in managing and solving difficult tasks by dividing them into more manageable components. One common method for task decomposition is the Chain of Thought (CoT) technique, which prompts the model to think step by step and decompose hard tasks into smaller steps. Another extension of CoT is the Tree of Thoughts, which explores multiple reasoning possibilities at each step by creating a tree structure of thought steps.\n\nTask decomposition can be achieved through various methods, such as using language models with simple prompting, task-specific instructions, or human inputs. By breaking down tasks into smaller components, agents can better plan and execute complex tasks effectively.\n\nIf you would like more detailed information or examples related to task decomposition, feel free to ask!', response_metadata={'token_usage': {'completion_tokens': 165, 'prompt_tokens': 611, 'total_tokens': 776}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-13296566-8577-4d65-982b-a39718988ca3-0')]}}
----

上面，代理没有将我们的查询逐字插入到工具中，而是删除了不必要的单词，例如“什么”和“是”。

同样的原则也允许代理在必要时使用对话的上下文：

query = "What according to the blog post are common ways of doing it? redo the search"

for s in agent_executor.stream(
    {"messages": [HumanMessage(content=query)]}, config=config
):
    print(s)
    print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_KvoiamnLfGEzMeEMlV3u0TJ7', 'function': {'arguments': '{"query":"common ways of task decomposition"}', 'name': 'blog_post_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 930, 'total_tokens': 951}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-dd842071-6dbd-4b68-8657-892eaca58638-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'common ways of task decomposition'}, 'id': 'call_KvoiamnLfGEzMeEMlV3u0TJ7'}])]}}
----
{'action': {'messages': [ToolMessage(content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\nFig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\n\nResources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.\n\n(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user\'s request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.', name='blog_post_retriever', id='c749bb8e-c8e0-4fa3-bc11-3e2e0651880b', tool_call_id='call_KvoiamnLfGEzMeEMlV3u0TJ7')]}}
----
{'agent': {'messages': [AIMessage(content='According to the blog post, common ways of task decomposition include:\n\n1. Using language models with simple prompting like "Steps for XYZ" or "What are the subgoals for achieving XYZ?"\n2. Utilizing task-specific instructions, for example, using "Write a story outline" for writing a novel.\n3. Involving human inputs in the task decomposition process.\n\nThese methods help in breaking down complex tasks into smaller and more manageable steps, facilitating better planning and execution of the overall task.', response_metadata={'token_usage': {'completion_tokens': 100, 'prompt_tokens': 1475, 'total_tokens': 1575}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-98b765b3-f1a6-4c9a-ad0f-2db7950b900f-0')]}}
----

请注意，代理能够推断出我们的查询中的“it”指的是“任务分解”，并生成合理的搜索查询作为结果 - 在本例中为“任务分解的常见方法”。

3、合并

为了方便起见，我们将所有必要的步骤整合到一个代码单元中：

import bs4
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.tools.retriever import create_retriever_tool
from langchain_chroma import Chroma
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.checkpoint.sqlite import SqliteSaver

memory = SqliteSaver.from_conn_string(":memory:")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

***
### Construct retriever ###
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

***
### Build retriever tool ###
tool = create_retriever_tool(
    retriever,
    "blog_post_retriever",
    "Searches and returns excerpts from the Autonomous Agents blog post.",
)
tools = [tool]

agent_executor = chat_agent_executor.create_tool_calling_executor(
    llm, tools, checkpointer=memory
)