目录
- 前言
- 一、LangChain介绍
- 二、在SQL问答时如何更好的提示?
- 2-1、安装
- 2-2、SQLite 样例数据
- 2-3、使用langchain与其进行交互
- 2-4、查看模型提示语
- 2-5、提供表定义和示例行
- 2-6、将表信息插入到Prompt中去
- 2-7、添加自然语言->SQL示例
- 2-8、在向量数据库中查找最相关的提示词
- 总结
前言
LangChain是一个框架,用于开发由大型语言模型(LLM)驱动的应用程序。一、LangChain介绍
LangChain是一个框架,用于开发由大型语言模型(LLM)驱动的应用程序。
LangChain 简化了 LLM 应用程序生命周期的每个阶段:
- 开发:使用LangChain的开源构建块和组件构建应用程序。使用第三方集成和模板开始运行。
- 生产化:使用 LangSmith 检查、监控和评估您的链条,以便您可以自信地持续优化和部署。
- 部署:使用 LangServe 将任何链转换为 API。
二、在SQL问答时如何更好的提示?
2-1、安装
pip install --upgrade --quiet langchain langchain-community langchain-experimental langchain-openai
2-2、SQLite 样例数据
参考:https://database.guide/2-sample-databases-sqlite/
Chinook 数据: 它代表了一个数字媒体商店,包括艺术家、专辑、媒体曲目、发票和客户的信息,以表格形式呈现。
1、创建数据库: 使用sqlite3 命令来创建
sqlite3 Chinook.db
2、sql脚本下载、运行
sql脚本地址: https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql
# 将脚本粘贴到Chinook_Sqlite.sql文件内后,执行以下命令可以创建数据库表。
.read Chinook_Sqlite.sql
2-3、使用langchain与其进行交互
我们可以使用SQLAlchemy驱动的SQLDatabase类与它交互:
from langchain_community.utilities import SQLDatabase
db = SQLDatabase.from_uri("sqlite:///Chinook.db", sample_rows_in_table_info=3)
print(db.dialect)
print(db.get_usable_table_names())
print(db.run("SELECT * FROM Artist LIMIT 10;"))
输出:
sqlite
[‘Album’, ‘Artist’, ‘Customer’, ‘Employee’, ‘Genre’, ‘Invoice’, ‘InvoiceLine’, ‘MediaType’, ‘Playlist’, ‘PlaylistTrack’, ‘Track’]
[(1, ‘AC/DC’), (2, ‘Accept’), (3, ‘Aerosmith’), (4, ‘Alanis Morissette’), (5, ‘Alice In Chains’), (6, ‘Antônio Carlos Jobim’), (7, ‘Apocalyptica’), (8, ‘Audioslave’), (9, ‘BackBeat’), (10, ‘Billy Cobham’)]
优化:
from langchain_community.utilities import SQLDatabase
import os
db_path = os.path.join(os.path.dirname(__file__), 'Chinook.db')
db_full_path = os.path.abspath(db_path)
db = SQLDatabase.from_uri(f"sqlite:///{db_full_path}")
2-4、查看模型提示语
安装:
pip install -qU langchain-openai
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
from langchain.chains import create_sql_query_chain
chain = create_sql_query_chain(llm, db)
chain.get_prompts()[0].pretty_print()
输出:
You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Pay attention to use date(‘now’) function to get the current date, if the question involves “today”.
Use the following format:
Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here
Only use the following tables:
{table_info}
Question: {input}
None
Notice:我这里使用的是阿里的模型,对传入的llm要做一个修改, 使用OpenAI的不需要修改。
from langchain_community.chat_models.tongyi import ChatTongyi
# 环境变量设置,模型接口设置
os.environ["LANGCHAIN_TRACING_V2"] = ""
os.environ["LANGCHAIN_API_KEY"] = ""
os.environ["DASHSCOPE_API_KEY"] = ''
model = ChatTongyi(
streaming=True,
)
2-5、提供表定义和示例行
概述: 在大多数SQL链中,我们至少需要向模型提供部分数据库大纲。没有这个,它将无法编写有效的查询。我们的数据库提供了一些方便的方法来提供相关的上下文。具体来说,我们可以从每个表中获取表名、表的概要和行示例。
context = db.get_context()
print(list(context))
print(context["table_info"])
输出: 只截取部分。
2-6、将表信息插入到Prompt中去
prompt_with_context = chain.get_prompts()[0].partial(table_info=context["table_info"])
print(prompt_with_context.pretty_repr()[:1500])
输出:
2-7、添加自然语言->SQL示例
概述: 在Prompt中包含将自然语言问题转换为针对数据库的有效SQL查询的示例,通常会提高模型性能,特别是对于复杂查询。
examples = [
{"input": "List all artists.", "query": "SELECT * FROM Artist;"},
{
"input": "Find all albums for the artist 'AC/DC'.",
"query": "SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC');",
},
{
"input": "List all tracks in the 'Rock' genre.",
"query": "SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');",
},
{
"input": "Find the total duration of all tracks.",
"query": "SELECT SUM(Milliseconds) FROM Track;",
},
{
"input": "List all customers from Canada.",
"query": "SELECT * FROM Customer WHERE Country = 'Canada';",
},
{
"input": "How many tracks are there in the album with ID 5?",
"query": "SELECT COUNT(*) FROM Track WHERE AlbumId = 5;",
},
{
"input": "Find the total number of invoices.",
"query": "SELECT COUNT(*) FROM Invoice;",
},
{
"input": "List all tracks that are longer than 5 minutes.",
"query": "SELECT * FROM Track WHERE Milliseconds > 300000;",
},
{
"input": "Who are the top 5 customers by total purchase?",
"query": "SELECT CustomerId, SUM(Total) AS TotalPurchase FROM Invoice GROUP BY CustomerId ORDER BY TotalPurchase DESC LIMIT 5;",
},
{
"input": "Which albums are from the year 2000?",
"query": "SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';",
},
{
"input": "How many employees are there",
"query": 'SELECT COUNT(*) FROM "Employee"',
},
]
构建提示词模板:
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
example_prompt = PromptTemplate.from_template("User input: {input}\nSQL query: {query}")
prompt = FewShotPromptTemplate(
examples=examples[:5],
example_prompt=example_prompt,
prefix="You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run. Unless otherwise specificed, do not return more than {top_k} rows.\n\nHere is the relevant table info: {table_info}\n\nBelow are a number of examples of questions and their corresponding SQL queries.",
suffix="User input: {input}\nSQL query: ",
input_variables=["input", "top_k", "table_info"],
)
print(prompt.format(input="How many artists are there?", top_k=3, table_info="foo"))
输出:
You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run. Unless otherwise specificed, do not return more than 3 rows.
Here is the relevant table info: foo
Below are a number of examples of questions and their corresponding SQL queries.
User input: List all artists.
SQL query: SELECT * FROM Artist;
User input: Find all albums for the artist ‘AC/DC’.
SQL query: SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = ‘AC/DC’);
User input: List all tracks in the ‘Rock’ genre.
SQL query: SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = ‘Rock’);
User input: Find the total duration of all tracks.
SQL query: SELECT SUM(Milliseconds) FROM Track;
User input: List all customers from Canada.
SQL query: SELECT * FROM Customer WHERE Country = ‘Canada’;
User input: How many artists are there?
SQL query:
2-8、在向量数据库中查找最相关的提示词
参考文章:
userGuide
How to better prompt when doing SQL question-answering
How to do query validation as part of SQL question-answering
How to deal with large databases when doing SQL question-answering
How to do question answering over CSVs
SQL Database
总结
今天是疯狂星期三,我吃了必胜客😍