使用Milvus和Llama-agents构建更强大的Agent系统

代理（Agent）系统能够帮助开发人员创建智能的自主系统，因此变得越来越流行。大语言模型（LLM）能够遵循各种指令，是管理 Agent 的理想选择，在许多场景中帮助我们尽可能减少人工干预、处理更多复杂任务。例如，Agent 系统解答客户咨询的问题，甚至根据客户偏好进行交叉销售。

本文将探讨如何使用 Llama-agents 和 Milvus 构建 Agent 系统。通过将 LLM 的强大功能与 Milvus 的向量相似性搜索能力相结合，我们可以创建智能且高效、可扩展的复杂 Agent 系统。

本文还将探讨如何使用不同的 LLM 来实现各种操作。对于较简单的任务，我们将使用规模较小且更价格更低的 Mistral Nemo 模型。对于更复杂的任务，如协调不同 Agent，我们将使用 Mistral Large 模型。

01.

Llama-agents、Ollama、Mistral Nemo 和 Milvus Lite 简介

Llama-agents：LlamaIndex 的扩展，通常与 LLM 配套使用，构建强大的 stateful、多 Actor 应用。
Ollama 和 Mistral Nemo: Ollama 是一个 AI 工具，允许用户在本地计算机上运行大语言模型（如 Mistral Nemo），无需持续连接互联网或依赖外部服务器。
Milvus Lite: Milvus 的轻量版，您可以在笔记本电脑、Jupyter Notebook 或 Google Colab 上本地运行 Milvus Lite。它能够帮助您高效存储和检索非结构化数据。

Llama-agents 原理

LlamaIndex 推出的 Llama-agents 是一个异步框架，可用于构建和迭代生产环境中的多 Agent 系统，包括多代理通信、分布式工具执行、人机协作等功能！

在 Llama-agents 中，每个 Agent 被视为一个服务，不断处理传入的任务。每个 Agent 从消息队列中提取和发布消息。

02.

安装依赖

第一步先安装所需依赖。

! pip install llama-agents pymilvus python-dotenv
! pip install llama-index-vector-stores-milvus llama-index-readers-file llama-index-embeddings-huggingface llama-index-llms-ollama llama-index-llms-mistralai

# This is needed when running the code in a Notebook
import nest_asyncio
nest_asyncio.apply()

from dotenv import load_dotenv
import os

load_dotenv()

03.

将数据加载到 Milvus

从 Llama-index 上下载示例数据。其中包含有关 Uber 和 Lyft 的 PDF 文件。

!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

现在，我们可以提取数据内容，然后使用 Embedding 模型将数据转换为 Embedding 向量，最终存储在 Milvus 向量数据库中。本文使用的模型为 bge-small-en-v1.5 文本 Embedding 模型。该模型较小且资源消耗量更低。

接着，在 Milvus 中创建 Collection 用于存储和检索数据。本文使用 Milvus 轻量版—— Milvus Lite。Milvus 是一款高性能的向量向量数据库，提供向量相似性搜索能力，适用于搭建 AI 应用。仅需通过简单的 pip install pymilvus 命令即可快速安装 Milvus Lite。

PDF 文件被转换为向量，我们将向量数据库存储到 Milvus 中。

from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# Define the default Embedding model used in this Notebook.
# bge-small-en-v1.5 is a small Embedding model, it's perfect to use locally
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

input_files=["./data/10k/lyft_2021.pdf", "./data/10k/uber_2021.pdf"]

# Create a single Milvus vector store
vector_store = MilvusVectorStore(
    uri="./milvus_demo_metadata.db",
    collection_name="companies_docs" 
    dim=384,
    overwrite=False,
)

# Create a storage context with the Milvus vector store
storage_context = StorageContext.from_defaults(vector_store=vector_store)
    
# Load data
docs = SimpleDirectoryReader(input_files=input_files).load_data()

# Build index
index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)

# Define the query engine
company_engine = index.as_query_engine(similarity_top_k=3)

04.

定义工具

我们需要定义两个与我们数据相关的工具。第一个工具提供关于 Lyft 的信息。第二个工具提供关于 Uber 的信息。在后续的内容中，我们将进一步探讨如何定义一个更广泛的工具。

# Define the different tools that can be used by our Agent.
query_engine_tools = [
   QueryEngineTool(
       query_engine=company_engine,
       metadata=ToolMetadata(
           name="lyft_10k",
           description=(
               "Provides information about Lyft financials for year 2021. "
               "Use a detailed plain text question as input to the tool."
           ),
       ),
   ),
   QueryEngineTool(
       query_engine=company_engine,
       metadata=ToolMetadata(
           name="uber_10k",
           description=(
               "Provides information about Uber financials for year 2021. "
               "Use a detailed plain text question as input to the tool."
           ),
       ),
   ),
]

05.

使用 Mistral Nemo 设置 Agent

我们将使用 Mistral Nemo 和 Ollama 限制资源用量、降低应用成本。Mistral Nemo + Ollama 的组合能够帮助我们在本地运行模型。Mistral Nemo 是一个小型 LLM，提供高达 128k Token 的大上下文窗口，这在处理大型文档时非常有用。此外，该 LLM 还经过微调，可以遵循精确的推理指令、处理多轮对话和生成代码。

from llama_index.llms.ollama import Ollama
from llama_index.core.agent import AgentRunner, ReActAgentWorker, ReActAgent

# Set up the agent
llm = Ollama(model="mistral-nemo", temperature=0.4)
agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)

# Example usage
response = agent.chat("Compare the revenue of Lyft and Uber in 2021.")
print(response)

生成响应如下所示：

> Running step 7ed275f6-b0de-4fd7-b2f2-fd551e58bfe2. Step input: Compare the revenue of Lyft and Uber in 2021.
Thought: The current language of the user is: English. I need to use tools to help me answer the question.
Action: lyft_10k
Action Input: {'input': "What was Lyft's total revenue in 2021?"}
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
       - Avoid using `tokenizers` before the fork if possible
       - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Observation: The total revenue for Lyft in 2021 was generated primarily from its ridesharing marketplace connecting drivers and riders, with revenue recognized from fees paid by drivers for using the Lyft Platform offerings in accordance with ASC 606.
> Running step 33064fd3-3c3a-42c4-ab5a-e7ebf8a9325b. Step input: None
Thought: I need to compare the revenue of Lyft and Uber in 2021.
Action: uber_10k
Action Input: {'input': "What was Uber's total revenue in 2021?"}
Observation: $17,455
> Running step 7eacfef4-d9da-4cbf-ac07-18f2ff6a3951. Step input: None
Thought: I have obtained Uber's total revenue for 2021. Now, I need to compare it with Lyft's.
Action: lyft_10k
Action Input: {'input': "What was Lyft's total revenue in 2021?"}
Observation: The total revenue for Lyft in 2021 was generated primarily from their ridesharing marketplace connecting drivers and riders. The revenue was recognized from fees paid by drivers for using the Lyft Platform offerings in accordance with ASC 606.
> Running step 88673e15-b74c-4755-8b9c-2b7ef3acea48. Step input: None
Thought: I have obtained both Uber's and Lyft's total revenues for 2021. Now, I need to compare them.
Action: Compare
Action Input: {'Uber': '$17,455', 'Lyft': '$3.6 billion'}
Observation: Error: No such tool named `Compare`.
> Running step bed5941f-74ba-41fb-8905-88525e67b785. Step input: None
Thought: I need to compare the revenues manually since there isn't a 'Compare' tool.
Answer: In 2021, Uber's total revenue was $17.5 billion, while Lyft's total revenue was $3.6 billion. This means that Uber generated approximately four times more revenue than Lyft in the same year.
Response without metadata filtering:
In 2021, Uber's total revenue was $17.5 billion, while Lyft's total revenue was $3.6 billion. This means that Uber generated approximately four times more revenue than Lyft in the same year.

06.

使用 Milvus 进行元数据过滤

虽然为每个公司的文档定义一个工具代理非常方便，但如果需要处理的文档很多，这种方法并不具备良好的扩展性。更好的解决方案是使用 Milvus 提供的元数据过滤功能。我们可以将来自不同公司的数据存储在同一个 Collection 中，但只检索特定公司的相关数据，从而节省时间和资源。

以下代码展示了如何使用元数据过滤功能：

from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

# Example usage with metadata filtering
filters = MetadataFilters(
   filters=[ExactMatchFilter(key="file_name", value="lyft_2021.pdf")]
)

filtered_query_engine = index.as_query_engine(filters=filters)

# Define query engine tools with the filtered query engine
query_engine_tools = [
   QueryEngineTool(
       query_engine=filtered_query_engine,
       metadata=ToolMetadata(
           name="company_docs",
           description=(
               "Provides information about various companies' financials for year 2021. "
               "Use a detailed plain text question as input to the tool."
           ),
       ),
   ),
]

# Set up the agent with the updated query engine tools
agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)

我们的检索器将过滤数据，仅考虑属于 lyft_2021.pdf文档的部分数据。因此，我们应该是搜索不到 Uber 相关的信息和内容的。

try:
   response = agent.chat("What is the revenue of uber in 2021?")
   print("Response with metadata filtering:")
   print(response)
except ValueError as err:
   print("we couldn't find the data, reached max iterations")

让我们测试一下。当我们针对 Uber 2021 年的公司收入进行提问时，Agent 没有检索出结果。

Thought: The user wants to know Uber's revenue for 2021.
Action: company_docs
Action Input: {'input': 'Uber Revenue 2021'}
Observation: I'm sorry, but based on the provided context information, there is no mention of Uber's revenue for the year 2021. The information primarily focuses on Lyft's revenue per active rider and critical accounting policies and estimates related to their financial statements.
> Running step c0014d6a-e6e9-46b6-af61-5a77ca857712. Step input: None

但当我们针对 Lyft 2021 年的公司收入进行提问时，Agent 能够检索出正确的答案。

try:
   response = agent.chat("What is the revenue of Lyft in 2021?")
   print("Response with metadata filtering:")
   print(response)
except ValueError as err:
   print("we couldn't find the data, reached max iterations")

返回结果如下：

> Running step 7f1eebe3-2ebd-47ff-b560-09d09cdd99bd. Step input: What is the revenue of Lyft in 2021?
Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: company_docs
Action Input: {'input': 'Lyft revenue 2021'}
Observation: Lyft's revenue in 2021 was primarily generated from its ridesharing marketplace connecting drivers and riders. The revenue was recognized from fees paid by drivers for using the Lyft Platform offerings in accordance with ASC 606. Additionally, revenue per Active Rider reached an all-time high in the fourth quarter of 2021 due to increased ride frequency, a shift towards higher revenue rides, and revenues from licensing and data access agreements.
> Running step 072a3253-7eee-44e3-a787-397c9cbe80d8. Step input: None
Thought: The current language of the user is English. I need to use a tool to help me answer the question.
Action: company_docs
Action Input: {'input': 'Lyft revenue 2021'}
Observation: Revenue for Lyft in 2021 was driven by an increase in ride frequency, a shift towards higher revenue rides such as airport rides, and revenues from licensing and data access agreements. This resulted in Revenue per Active Rider reaching an all-time high in the fourth quarter of 2021 compared to the previous quarter.
> Running step 6494fe6d-27ad-484f-9204-0c4683bfa1c2. Step input: None
Thought: The user is asking for Lyft's revenue in 2021.
Action: company_docs
Action Input: {'input': 'Lyft revenue 2021'}
Observation: Lyft's revenue in 2021 was primarily generated from its ridesharing marketplace connecting drivers and riders. Revenue per Active Rider reached a record high in the fourth quarter of 2021 due to increased ride frequency and a shift towards higher revenue rides, such as airport rides. Additionally, revenue was boosted by licensing and data access agreements starting in the second quarter of 2021.
> Running step 0076b6dd-e7d0-45ac-a39a-4afa5f1aaf47. Step input: None
Answer: Observation: Lyft's total revenue in 2021 was $3.4 billion.
Response with metadata filtering:
Observation: Lyft's total revenue in 2021 was $3.4 billion.

07.

使用 LLM 自动创建元数据过滤器

现在，让我们基于用户问题使用 LLM 自动创建元数据过滤器，从而提升 Agent 效率。

from llama_index.core.prompts.base import PromptTemplate

# Function to create a filtered query engine
def create_query_engine(question):
   # Extract metadata filters from question using a language model
   prompt_template = PromptTemplate(
       "Given the following question, extract relevant metadata filters.\n"
       "Consider company names, years, and any other relevant attributes.\n"
       "Don't write any other text, just the MetadataFilters object"
       "Format it by creating a MetadataFilters like shown in the following\n"
       "MetadataFilters(filters=[ExactMatchFilter(key='file_name', value='lyft_2021.pdf')])\n"
       "If no specific filters are mentioned, returns an empty MetadataFilters()\n"
       "Question: {question}\n"
       "Metadata Filters:\n"
   )
  
   prompt = prompt_template.format(question=question)
   llm = Ollama(model="mistral-nemo")
   response = llm.complete(prompt)
  
   metadata_filters_str = response.text.strip()
   if metadata_filters_str:
       metadata_filters = eval(metadata_filters_str)
       return index.as_query_engine(filters=metadata_filters)
   return index.as_query_engine()

我们可以将上述 Function 整合到 Agent 中。

# Example usage with metadata filtering
question = "What is Uber revenue? This should be in the file_name: uber_2021.pdf"
filtered_query_engine = create_query_engine(question)

# Define query engine tools with the filtered query engine
query_engine_tools = [
   QueryEngineTool(
       query_engine=filtered_query_engine,
       metadata=ToolMetadata(
           name="company_docs_filtering",
           description=(
               "Provides information about various companies' financials for year 2021. "
               "Use a detailed plain text question as input to the tool."
           ),
       ),
   ),
]

# Set up the agent with the updated query engine tools
agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)

response = agent.chat(question)
print("Response with metadata filtering:")
print(response)

现在，Agent 使用键值file_name 和 uber_2021.pdf 来创建 Metadatafilters。Prompt 越复杂，Agent 能够创建更多高级过滤器。

MetadataFilters(filters=[ExactMatchFilter(key='file_name', value='uber_2021.pdf')])
<class 'str'>
eval: filters=[MetadataFilter(key='file_name', value='uber_2021.pdf', operator=<FilterOperator.EQ: '=='>)] condition=<FilterCondition.AND: 'and'>
> Running step a2cfc7a2-95b1-4141-bc52-36d9817ee86d. Step input: What is Uber revenue? This should be in the file_name: uber_2021.pdf
Thought: The current language of the user is English. I need to use a tool to help me answer the question.
Action: company_docs
Action Input: {'input': 'Uber revenue 2021'}
Observation: $17,455 million

08.

使用 Mistral Large 作为编排系统

Mistral Large 是一款比 Mistral Nemo 更强大的模型，但它也会消耗更多资源。如果仅将其用作编排器，我们可以节省部分资源，同时享受智能 Agent 带来的便利。

为什么使用 Mistral Large？

Mistral Large是Mistral AI推出的旗舰型号，具有顶级推理能力，支持复杂的多语言推理任务，包括文本理解、转换和代码生成，非常适合需要大规模推理能力或高度专业化的复杂任务。其先进的函数调用能力正是我们协调不同 Agent 时所需的功能。

我们无需针对每个任务都使用一个重量级的模型，这会对我们的系统造成负担。我们可以使用 Mistral Large 指导其他 Agent 进行特定的任务。这种方法不仅优化了性能，还降低了运营成本，提升系统可扩展性和效率。

Mistral Large 将充当中央编排器的角色，协调由 Llama-agents 管理的多个 Agent 活动：

Task Delegation（分派任务）：当收到复杂查询时，Mistral Large 确定最合适的 Agent 和工具来处理查询的各个部分。
Agent Coordination（代理协调）：Llama-agents 管理这些任务的执行情况，确保每个 Agent 接收到必要的输入，且其输出被正确处理和整合。
Result Synthesis（综合结果）：Mistral Large 然后将来自各个 Agent 的输出编译成一个连贯且全面的响应，确保最终输出大于其各部分的总和。

Llama Agents

将 Mistral Large 作为编排器，并使用 Agent 生成回答。

from llama_agents import (
    AgentService,
    ToolService,
    LocalLauncher,
    MetaServiceTool,
    ControlPlaneServer,
    SimpleMessageQueue,
    AgentOrchestrator,
)

from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.llms.mistralai import MistralAI

# create our multi-agent framework components
message_queue = SimpleMessageQueue()
control_plane = ControlPlaneServer(
    message_queue=message_queue,
    orchestrator=AgentOrchestrator(llm=MistralAI('mistral-large-latest')),
)

# define Tool Service
tool_service = ToolService(
    message_queue=message_queue,
    tools=query_engine_tools,
    running=True,
    step_interval=0.5,
)

# define meta-tools here
meta_tools = [
    await MetaServiceTool.from_tool_service(
        t.metadata.name,
        message_queue=message_queue,
        tool_service=tool_service,
    )
    for t in query_engine_tools
]

# define Agent and agent service
worker1 = FunctionCallingAgentWorker.from_tools(
    meta_tools,
    llm=MistralAI('mistral-large-latest')
)

agent1 = worker1.as_agent()
agent_server_1 = AgentService(
    agent=agent1,
    message_queue=message_queue,
    description="Used to answer questions over differnet companies for their Financial results",
    service_name="Companies_analyst_agent",
)

import logging

# change logging level to enable or disable more verbose logging
logging.getLogger("llama_agents").setLevel(logging.INFO)

## Define Launcher
launcher = LocalLauncher(
   [agent_server_1, tool_service],
   control_plane,
   message_queue,
)

query_str = "What are the risk factors for Uber?"
print(launcher.launch_single(query_str))

> Some key risk factors for Uber include fluctuations in the number of drivers and merchants due to dissatisfaction with the brand, pricing models, and safety incidents. Investing in autonomous vehicles may also lead to driver dissatisfaction, as it could reduce the need for human drivers. Additionally, driver dissatisfaction has previously led to protests, causing business interruptions.

09.

总结

本文介绍了如何使用 Llama-agents 框架创建和使用代理，该框架由 Mistral Nemo 和 Mistral Large 两个不同的大语言模型驱动。我们展示了如何利用不同 LLM 的优势，有效协调资源，搭建一个智能、高效的系统。

如果您喜欢本文内容，请在 GitHub 上为我们点亮🌟https://github.com/milvus-io/milvus。欢迎在 Milvus 社区中分享您的见解！

作者介绍