FlashRAG

文章目录

- 一、关于 FlashRAG
- - 特点 ✨
  - 🔧 安装
- 二、快速入门🏃
- - - 1、Toy Example
    - 2、使用现成的管道
    - 3、建立自己的管道
    - 4、只需使用组件
- 三、组件⚙️
- - - 1、RAG 组件
    - 2、管道
- 四、支持方法🤖
- 五、支持数据集📓
- 六、其他常见问题解答 🙌

一、关于 FlashRAG

github ： https://github.com/RUC-NLPIR/FlashRAG
paper : FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
https://arxiv.org/pdf/2405.13576

FlashRAG 是一个用于复现和开发检索增强生成 (RAG) 研究的 Python 工具包。

工具包包括 32 个预处理的基准 RAG 数据集和 12 个最先进的 RAG 算法。

在这里插入图片描述

借助 FlashRAG 和提供的资源，您可以轻松地在 RAG 域中重现现有的 SOTA 作品或实现自定义的 RAG 流程和组件。

FlashRAG 是根据MIT 许可证授权的。

特点 ✨

🛠 广泛且可定制的框架：包括 RAG 场景的基本组件，例如检索器、重新排序器、生成器和压缩器，允许灵活组装复杂的管道。
🗂 全面的基准数据集：32 个预处理的 RAG 基准数据集的集合，用于测试和验证 RAG 模型的性能。
🎯 预实现的高级 RAG 算法：基于我们的框架，具有 12 种先进的 RAG 算法，并报告了结果。可轻松在不同设置下重现结果。
🧩 高效的预处理阶段：通过提供检索的语料库处理、检索索引构建和文档预检索等各种脚本，简化 RAG 工作流程准备。
🚀 优化执行：通过 vLLM、用于 LLM 推理加速的 FastChat 和用于向量索引管理的 Faiss 等工具增强了库的效率。

🔧 安装

要开始使用 FlashRAG，只需从 Github 克隆并安装（需要 Python 3.9+）：

git clone https://github.com/RUC-NLPIR/FlashRAG.git
cd FlashRAG
pip install -e .

二、快速入门🏃

1、Toy Example

运行以下代码，使用提供的玩具数据集实现一个简单的 RAG 管道。

默认检索器是e5，默认生成器是llama2-7B-chat。

您需要在以下命令中填写相应的模型路径。

如果您希望使用其他模型，请参阅下面的详细说明。

cd examples/quick_start
python simple_pipeline.py \
    --model_path=<LLAMA2-7B-Chat-PATH> \
    --retriever_path=<E5-PATH>

代码完成后可以在相应路径下的输出文件夹里查看运行的中间结果以及最终的评估分数。

注意： 此示例仅用于测试整个流程是否能正常运行，我们的示例检索文档仅包含 1000 条数据，因此可能无法获得良好的结果。

2、使用现成的管道

您可以使用我们已经构建好的pipeline类（如pipelines所示）来实现里面的RAG流程，这种情况下，只需要配置config，加载对应的pipeline即可。

首先加载整个流程的config，里面记录了RAG流程中需要用到的各种超参数，可以把yaml文件作为参数输入，也可以直接作为变量输入，变量作为输入的优先级高于文件。

from flashrag.config import Config

config_dict = {'data_dir': 'dataset/'}
my_config = Config(config_file_path = 'my_config.yaml',
                config_dict = config_dict)

您可以参考我们提供的基础yaml文件来设置自己的参数，具体参数名称及含义请参考config参数说明。

接下来加载相应的数据集，并初始化管道。管道中的组件将被自动加载。

from flashrag.utils import get_dataset
from flashrag.pipeline import SequentialPipeline
from flashrag.prompt import PromptTemplate
from flashrag.config import Config

config_dict = {'data_dir': 'dataset/'}
my_config = Config(config_file_path = 'my_config.yaml',
                config_dict = config_dict)
all_split = get_dataset(my_config)
test_data = all_split['test']

pipeline = SequentialPipeline(my_config)

您可以使用以下方式指定自己的输入提示PromptTemplete：

prompt_templete = PromptTemplate(
    config, 
    system_prompt = "Answer the question based on the given document. Only give me the answer and do not output any other words.\nThe following are given documents.\n\n{reference}",
    user_prompt = "Question: {question}\nAnswer:"
)
pipeline = SequentialPipeline(my_config, prompt_template=prompt_templete)

最后执行pipeline.run即可得到最终结果。

output_dataset = pipeline.run(test_data, do_eval=True)

包含output_dataset输入数据集中每项的中间结果和度量分数。

同时，包含中间结果和总体评估分数的数据集也将保存为文件（如果指定了save_intermediate_data和save_metric_score）。

3、建立自己的管道

有时你可能需要实现更复杂的RAG流程，你可以构建自己的流水线来实现，只需要继承BasicPipeline，初始化你需要的组件，完成run功能即可。

from flashrag.pipeline import BasicPipeline
from flashrag.utils import get_retriever, get_generator

class ToyPipeline(BasicPipeline):
  def __init__(self, config, prompt_templete=None):
    # Load your own components
    pass

  def run(self, dataset, do_eval=True):
    # Complete your own process logic

    # get attribute in dataset using `.`
    input_query = dataset.question
    ...
    # use `update_output` to save intermeidate data
    dataset.update_output("pred",pred_answer_list)
    dataset = self.evaluate(dataset, do_eval=do_eval)
    return dataset

了解您需要使用的组件的输入和输出形式：

https://github.com/RUC-NLPIR/FlashRAG/blob/main/docs/basic_usage.md

4、只需使用组件

如果您已经有自己的代码，只想使用我们的组件来嵌入原有的代码，您可以参考组件的基本介绍来获取各个组件的输入输出格式。

三、组件⚙️

在 FlashRAG 中，我们构建了一系列常用的 RAG 组件，包括检索器、生成器、refiners 等。

基于这些组件，我们组装了多个管道来实现 RAG 工作流，同时还提供了灵活性，可以按自定义方式组合这些组件以创建您自己的管道。

1、RAG 组件

Type	Module	Description
Judger	SKR Judger	Judging whether to retrieve using SKRmethod
Retriever	Dense Retriever	Bi-encoder models such as dpr, bge, e5, using faiss for search
	BM25 Retriever	Sparse retrieval method based on Lucene
	Bi-Encoder Reranker	Calculate matching score using bi-Encoder
	Cross-Encoder Reranker	Calculate matching score using cross-encoder
Refiner	Extractive Refiner	Refine input by extracting important context
	Abstractive Refiner	Refine input through seq2seq model
	LLMLingua Refiner	LLMLingua-series prompt compressor
	SelectiveContext Refiner	Selective-Context prompt compressor
Generator	Encoder-Decoder Generator	Encoder-Decoder model, supporting Fusion-in-Decoder (FiD)
	Decoder-only Generator	Native transformers implementation
	FastChat Generator	Accelerate with [FastChat](
	vllm Generator	Accelerate with vllm

2、管道

参考有关检索增强生成的调查，我们根据推理路径将 RAG 方法分为四类。

顺序：RAG 过程的顺序执行，如查询-（检索前）-检索器-（检索后）生成器
条件：针对不同类型的输入查询实现不同的路径
分支：并行执行多个路径，合并每个路径的响应
循环：迭代执行检索和生成

在每个类别中，我们都实现了相应的通用流程，部分流程还有相应的工作底稿。

Type	Module	Description
Sequential	Sequential Pipeline	Linear execution of query, supporting refiner, reranker
Conditional	Conditional Pipeline	With a judger module, distinct execution paths for various query types
Branching	REPLUG Pipeline	Generate answer by integrating probabilities in multiple generation paths
	SuRe Pipeline	Ranking and merging generated results based on each document
Loop	Iterative Pipeline	Alternating retrieval and generation
	Self-Ask Pipeline	Decompose complex problems into subproblems using self-ask
	Self-RAG Pipeline	Adaptive retrieval, critique, and generation
	FLARE Pipeline	Dynamic retrieval during the generation process

四、支持方法🤖

我们实施了 12 部作品，其一致设定如下：

生成器： LLAMA3-8B-instruct，输入长度为 4096
检索器： e5-base-v2 作为嵌入模型，每个查询检索 5 个文档
提示： 一致的默认提示，模板可以在这里找：
https://github.com/RUC-NLPIR/FlashRAG/blob/main/flashrag/prompt/base_prompt.py

对于开源方法，我们利用我们的框架实现了它们的流程。对于作者没有提供源代码的方法，我们会尽量遵循原论文中的方法进行实现。

对于某些方法特有的必要设置和超参数，我们在具体设置栏中进行了记录。更多详细信息，请查阅我们的代码。

需要注意的是，为了确保一致性，我们使用了统一的设置。但是，此设置可能与该方法的原始设置不同，导致结果与原始结果有所不同。

Method	Type	NQ (EM)	TriviaQA (EM)	Hotpotqa (F1)	2Wiki (F1)	PopQA (F1)	WebQA(EM)	Specific setting
Naive Generation	Sequential	22.6	55.7	28.4	33.9	21.7	18.8
Standard RAG	Sequential	35.1	58.9	35.3	21.0	36.7	15.7
AAR-contriever-kilt	Sequential	30.1	56.8	33.4	19.8	36.1	16.1
LongLLMLingua	Sequential	32.2	59.2	37.5	25.0	38.7	17.5	Compress Ratio=0.5
RECOMP-abstractive	Sequential	33.1	56.4	37.5	32.4	39.9	20.2
Selective-Context	Sequential	30.5	55.6	34.4	18.5	33.5	17.3	Compress Ratio=0.5
Ret-Robust	Sequential	42.9	68.2	35.8	43.4	57.2	33.7	Use LLAMA2-13B with trained lora
SuRe	Branching	37.1	53.2	33.4	20.6	48.1	24.2	Use provided prompt
REPLUG	Branching	28.9	57.7	31.2	21.1	27.8	20.2
SKR	Conditional	25.5	55.9	29.8	28.5	24.5	18.6	Use infernece-time training data
Self-RAG	Loop	36.4	38.2	29.6	25.1	32.7	21.9	Use trained selfrag-llama2-7B
FLARE	Loop	22.5	55.8	28.0	33.9	20.7	20.2
Iter-Retgen, ITRG	Loop	36.8	60.1	38.3	21.6	37.9	18.2

五、支持数据集📓

我们收集并处理了 35 个在 RAG 研究中广泛使用的数据集，并对其进行了预处理，以确保格式一致，方便使用。对于某些数据集（例如 Wiki-asp），我们根据社区内常用的方法对其进行了调整，以适应 RAG 任务的要求。

对于每个数据集，我们将每个分割保存为一个jsonl文件，每行是一个字典，如下所示：

{
  'id': str,
  'question': str,
  'golden_answers': List[str],
  'metadata': dict
}

以下是数据集列表以及相应的样本大小：

Task	Dataset Name	Knowledge Source	# Train	# Dev	# Test
QA	NQ	wiki	79,168	8,757	3,610
QA	TriviaQA	wiki & web	78,785	8,837	11,313
QA	PopQA	wiki	/	/	14,267
QA	SQuAD	wiki	87,599	10,570	/
QA	MSMARCO-QA	web	808,731	101,093	/
QA	NarrativeQA	books and story	32,747	3,461	10,557
QA	WikiQA	wiki	20,360	2,733	6,165
QA	WebQuestions	Google Freebase	3,778	/	2,032
QA	AmbigQA	wiki	10,036	2,002	/
QA	SIQA	-	33,410	1,954	/
QA	CommenseQA	-	9,741	1,221	/
QA	BoolQ	wiki	9,427	3,270	/
QA	PIQA	-	16,113	1,838	/
QA	Fermi	wiki	8,000	1,000	1,000
multi-hop QA	HotpotQA	wiki	90,447	7,405	/
multi-hop QA	2WikiMultiHopQA	wiki	15,000	12,576	/
multi-hop QA	Musique	wiki	19,938	2,417	/
multi-hop QA	Bamboogle	wiki	/	/	125
Long-form QA	ASQA	wiki	4,353	948	/
Long-form QA	ELI5	Reddit	272,634	1,507	/
Open-Domain Summarization	WikiASP	wiki	300,636	37,046	37,368
multiple-choice	MMLU	-	99,842	1,531	14,042
multiple-choice	TruthfulQA	wiki	/	817	/
multiple-choice	HellaSWAG	ActivityNet	39,905	10,042	/
multiple-choice	ARC	-	3,370	869	3,548
multiple-choice	OpenBookQA	-	4,957	500	500
Fact Verification	FEVER	wiki	104,966	10,444	/
Dialog Generation	WOW	wiki	63,734	3,054	/
Entity Linking	AIDA CoNll-yago	Freebase & wiki	18,395	4,784	/
Entity Linking	WNED	Wiki	/	8,995	/
Slot Filling	T-REx	DBPedia	2,284,168	5,000	/
Slot Filling	Zero-shot RE	wiki	147,909	3,724	/