基于RAG的问答机器人
前置条件
什么是RAG
https://blog.csdn.net/m0_56699208/article/details/138063866?spm=1001.2014.3001.5502
quickstart
构建
概括地说,任何 SQL 链和 agent 的步骤如下:
- 将问题转换为 SQL 查询:模型将用户输入转换为 SQL 查询。
- 执行 SQL 查询:执行 SQL 查询。
- 回答问题:模型使用 查询结果。
环境配置
安装依赖包
%pip install --upgrade --quiet langchain langchain-community langchain-openai
环境变量
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()
# Uncomment the below to use LangSmith. Not required.
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
下面的示例将使用与 Chinook 数据库的 SQLite 连接。 按照这些安装进行操作 在与此笔记本相同的目录中创建的步骤:Chinook.db
- 保存 此内容 文件为
Chinook_Sqlite.sql
- 跑
sqlite3 Chinook.db
- 跑
.read Chinook_Sqlite.sql
- 测试
SELECT * FROM Artist LIMIT 10
现在,在我们的目录中,我们可以使用 SQLAlchemy 与它交互 :
from langchain_community.utilities import SQLDatabase
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(db.dialect)
print(db.get_usable_table_names())
db.run("SELECT * FROM Artist LIMIT 10;")
构造链
我们要创造一个工作链,让模型接受一个问题,将其转换为 SQL query,执行查询,并使用结果来回答原始问题
将问题转换为 SQL 查询
使用 create_sql_query_chain
from langchain.chains import create_sql_query_chain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
chain = create_sql_query_chain(llm, db)
response = chain.invoke({"question": "How many employees are there"})
response
得到的 response:
'SELECT COUNT(*) FROM Employee'
完整 prompt
You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Pay attention to use date('now') function to get the current date, if the question involves "today".
Use the following format:
Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here
Only use the following tables:
{table_info}
Question: {input}
执行 SQL 查询
现在我们已经生成了一个 SQL 查询,我们将需要执行它。这是创建 SQL 链最危险的部分。
使用 QuerySQLDatabaseTool
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool
execute_query = QuerySQLDataBaseTool(db=db)
write_query = create_sql_query_chain(llm, db)
chain = write_query | execute_query
chain.invoke({"question": "How many employees are there"})
回答问题
现在,我们已经有了自动生成和执行查询的方法, 我们只需要将原始问题和 SQL 查询结果结合起来 生成最终答案。我们可以通过传递问题和结果来做到这一点:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
answer_prompt = PromptTemplate.from_template(
"""Given the following user question, corresponding SQL query, and SQL result, answer the user question.
Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer: """
)
answer = answer_prompt | llm | StrOutputParser()
chain = (
RunnablePassthrough.assign(query=write_query).assign(
result=itemgetter("query") | execute_query
)
| answer
)
chain.invoke({"question": "How many employees are there"})
对于更复杂的查询生成,我们可能希望创建 few shot 提示或添加查询检查步骤。下面是技术参考:
- 提示策略
- 查询检查
- 大型数据库