一、使用notion样例数据构建知识库
这里使用LangChain开发框架支持的Faiss构建知识向量库,通过以下命令来安装Faiss的GPU版本:
pip install faiss-gpu
简单起见,向量库会以文件的形式存储到磁盘,具体步骤如下:
- 引入相关的包,包括对知识文档进行分割的CharacterTextSplitter,向量库FAISS,以及向量编码方式HuggingFaceEmbeddings:
2. 加载notion样例数据文件,格式为markdown文件:
3. 由于LLM有上下文长度限制,所以需要对文档进行切分操作,这里设定的chunk_size为1500:
4. 接下来选择一个把文本转为向量的模型,这里选择HuggingFace上面的一个开源模型“shibing624/text2vec-base-multilingual”,通过Faiss构建存储到磁盘的知识向量库:
二、加载ChatGLM-3模型
通过访问huggingface.co可以获取开源模型chatglm3-6b:
也可以通过第1次下载模型时指定保存到本地cache,这样后面就可以直接从cache加载模型从而节约下载时间。
三、LLM基于LangChain访问知识库与用户交互演示
这里使用LangChain API RetrievalQA来访问知识库,主要步骤如下:
- 从磁盘加载向量库文件:
2. 设计一个prompt模板:
3. 使用RetrievalQA来基于对知识库的访问结果与LLM进行交互:
4. 为了方便演示,构建一个查询输入框,从LLM返回结果中提取问题回答和知识库文档引用来源:
5. 接下来测试一下LLM是否能根据知识库正确回答用户问题,首先使用中文输入问题进行查询:
返回结果如下:
引用来源中提到了这个文件:
Blendles Employee Handbook a834d55573614857a48a9ce9ec4194e3/Your 1st month 5f253fc3413b427f8df1c4d0155ac153.md,打开这个文档,可以看到红色框部分明确提到试用期是1个月,但是上面的回答是3个月,明显与知识库不符(参见下面文档内容中红色字体标识部分):
# Your 1st month
Hey you! Welcome to Blendle. Buckle up, you're in for one hell of a ride :).
The faster you get settled in the better, so we came up with a structure for your first month to make sure you have a smooth start.
- **Structure 1st month**
Your first month is officially your trial period, which means both parties can terminate the contract if we figure out it is a total mismatch. In Blendle history, this happened once. Your first month is **not** an extended hiring process. As you might have noticed, the hiring process was pretty thorough and you made it through. This means we are pretty sure this is a good match. But, life happens so it's good to have a safeguard like the trial period, just don't feel any extra pressure plz.
- **Before day 1:** your contract is signed and all the administrative stuff is taken care of. After that you got your on-boarding e-mail which probably led you here. This e-mail contains all the basic knowledge and acces codes to get you going.
- **Day 1:** **Start** around 10:00 (depending on your team, check with them), receive your laptop and Blendle stuff, get settled in (where is the toilet, how does the coffee machine work). Your buddy will get you started in terms of work.
……
6. 接下来使用英文输入同样的问题进行查询:
这次LLM给出了正确的回答,试用期是1个月。