LLM大语言模型（十）：LangChain自定义Agent使用自定义的LLM

背景

独立部署ChatGLM3-6B并提供HTTP API能力。

自定义LLM封装对ChatGLM3-6B的访问。

创建一个简单的Agent来使用自定义的LLM。

自行封装LLM（MyChatGLM）

上一篇文章LLM大语言模型（九）：LangChain封装自定义的LLM-CSDN博客

已经介绍过如何在LangChain里封装自定义的LLM。

本文对本地部署的ChatGLM3-6B进行了简单的封装。

import requests
import json
from typing import Any, List, Optional
from langchain.llms.base import LLM
from langchain_core.callbacks import CallbackManagerForLLMRun


class MyChatGLM(LLM):
    model: str = "chatglm3-6b"
    url: str = "http://localhost:8000/v1/chat/completions"

    # def __init__(self):
    #     super().__init__()

    @property
    def _llm_type(self) -> str:
        return "MyChatGLM"

    def _resp_process_mock(self,input:str,resp:str):
        final_answer_json = {
            "action": "Final Answer",
            "action_input": input
        }
        return f"""
Action: 
```
{json.dumps(final_answer_json, ensure_ascii=False)}
```"""

    def _call(self, prompt: str, stop: Optional[List[str]] = None, run_manager: Optional[CallbackManagerForLLMRun] = None, **kwargs: Any) -> str:
        data = {}
        data["model"] = self.model
        lst = []
        lst.append({"role":"user","content":prompt})
        data["messages"] = lst
        resp = self.doRequest(data)
        return self._resp_process_mock(prompt,resp)


    def doRequest(self,payload:dict) -> str:
        # 请求头
        headers = {"content-type":"application/json"}
        # json形式，参数用json
        res = requests.post(self.url,json=payload,headers=headers)
        return res.text

mllm = MyChatGLM()
print(mllm._llm_type)
# mllm._llm_type = "haha" _llm_type该属性是无法被修改的
print(mllm("hello world!"))

_call()方法的实现

本示例是个简单的QA，没有记录history。

_call()方法内部就是通过HTTP POST请求调用本地部署的ChatGLM3-6B服务。

_resp_process_mock()方法

这个是对LLM返回的结果进行了格式化的封装，直接返回action: Final Answer。

为什么这样返回，下文有解释。

自定义Agent

Agent使用的是LangChain的Structured chat类型的Agent，执行结构化的chat能力。

Structured chat类型的Agent支持多个Tool的输入，本文示例未考虑引入tool，所以在_resp_process_mock()里直接返回action: Final Answer。

action: Final Answer表示chat过程中使用tool的推理执行过程已经结束。（详见下文）

from langchain import hub
from langchain.agents import AgentExecutor, create_structured_chat_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from my_chatglm3 import MyChatGLM

if __name__ == "__main__":
    # tools = [TavilySearchResults(max_results=1)]
    tools = []
    prompt = hub.pull("hwchase17/structured-chat-agent")
    # Choose the LLM to use
    llm = MyChatGLM()

    # Construct the agent
    agent = create_structured_chat_agent(llm, tools, prompt)
    # Create an agent executor by passing in the agent and tools
    agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)
    agent_executor.invoke({"input": "我心情不好，给我讲个笑话逗我开心"})

ChatPromptTemplate structured-chat-agent

使用的是hwchase17/structured-chat-agent，其结构如下：

System: Respond to the human as helpfully and accurately as possible. You have access to the following tools:


Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).
Valid \"action\" values: \"Final Answer\" or 
Provide only ONE action per $JSON_BLOB, as shown:
```\n{\n  \"action\": $TOOL_NAME,\n  \"action_input\": $INPUT\n}\n```
Follow this format:
Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{
  \"action\": \"Final Answer\",
  \"action_input\": \"Final response to human\"
}

Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation

Human: 我心情不好，给我讲个笑话逗我开心
 (reminder to respond in a JSON blob no matter what)

第一段：声明System信息，类似我们使用LLM对话时的prompt engineering，指定角色等context信息。

第二段：声明如何使用tool以及推理过程，tool本文不涉及后续再写。注意对推理过程的约束，推理结束的标志是action: Final Answer，如果LLM给Agent返回的结果一直没有action: Final Answer，会导致Agent一直推理下去进入“死循环”。

这也是为什么_resp_process_mock()方法直接就返回action: Final Answer，一轮对话就结束就完事。

第三段：声明输出格式。

第四段：Human用户的输入

Agent执行结果

> Entering new AgentExecutor chain...

Action:
```
{"action": "Final Answer", 
"action_input": "{\"model\":\"chatglm3-6b\",\"id\":\"\",\"object\":\"chat.completion\",\"choices\":[{\"index\":0,\"message\":{\"role\":\"assistant\",\"content\":\"{\\n  \\\"action\\\": \\\"Final Answer\\\",\\n  \\\"action_input\\\": \\\"A joke for you: Why was the math book sad? Because it had too many problems.\\\"\\n}\",\"name\":null,\"function_call\":null},\"finish_reason\":\"stop\"}],\"created\":1712419428,\"usage\":{\"prompt_tokens\":297,\"total_tokens\":339,\"completion_tokens\":42}}"}
```

> Finished chain.

可以看到这里的推理过程，只有一个action: Final Answer。

action_input这里直接返回了LLM的原始回答。

其中的A joke for you: Why don't scientists trust atoms? Because they make up everything! 这个就是LLM的回答。