使用 Elastic 和 Apple 的 OpenELM 模型构建 RAG 系统

作者：来自 Elastic Gustavo Llermaly

如何部署和测试新的 Apple 模型并使用 Elastic 构建 RAG 系统。

在本文中，我们将学习部署和测试新的 Apple 模型，并构建一个 RAG 系统来模拟 Apple Intelligence，使用 Elastic 作为向量数据库，OpenELM 作为模型提供者。

这里有一个包含完整练习的笔记本。

简介

4 月，Apple 发布了其开放高效语言模型 (OpenELM)，其参数有 2.7 亿、4.5 亿、1.1 亿和 3 亿，包括聊天（chat）和指令（instruct）版本。参数较大的模型通常更适合执行复杂任务，但速度较慢且耗费更多资源，而参数较小的模型则速度更快、要求更低。选择取决于我们想要解决的问题。

创建者从研究的角度强调了该模型的相关性，他们提供了训练模型所需的一切，并且在某些情况下展示了他们的模型如何以更少的参数获得比竞争对手更高的性能。

这些模型的显著特点是透明性，因为复制它们所需的一切都是开放的，而和那些只提供模型权重和推理代码并在私有数据集上进行预训练的模型则不同。

用于生成和训练这些模型的框架 (CoreNet) 也已可用。

OpenELM 模型的优势之一是它们可以迁移到 MLX，MLX 是一个针对配备 Apple Silicon 处理器的设备优化的深度学习框架，因此它们可以通过为这些设备训练本地模型来从这项技术中受益。

Apple 刚刚发布了新款 iPhone，其中一项新功能是 Apple Intelligence，它利用 AI 来帮助完成通知分类、上下文感知推荐和电子邮件编写等任务。

让我们使用 Elastic 和 OpenELM 构建一个应用程序来实现相同的目标！

应用程序流程如下：

步骤

部署模型
索引数据
测试模型

部署模型

第一步是部署模型。你可以在此处找到有关模型的完整信息：https://huggingface.co/collections/apple/openelm-instruct-models-6619ad295d7ae9f868b759ca

我们将使用指令（instruct）模型，因为我们希望我们的模型遵循指令而不是与其交谈。指令模型是针对一次性请求而不是进行对话进行训练的。

首先，我们需要克隆存储库：

git clone https://huggingface.co/apple/OpenELM

然后，你需要在此处获取 HuggingFace 访问 token。

接下来，你需要请求访问 HuggingFace 中的 Llama-2-7b 模型以使用 OpenELM 分词器。

之后，在刚刚克隆的存储库文件夹中运行以下命令：

python generate_openelm.py --model apple/OpenELM-270M-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10

你应该收到类似这样的回复：

Once upon a time there was a man named John Smith. He had been born in the small town of Pine Bluff, Arkansas, and raised by his single mother, Mary Ann. John's family moved to California when he was young, settling in San Francisco where he attended high school. After graduating from high school, John enlisted in the U.S. Army as a machine gunner. John's first assignment took him to Germany, serving with the 1st Battalion, 12th Infantry Regiment. During this time, John learned German and quickly became fluent in the language. In fact, it took him only two months to learn all 3,000 words of the alphabet. John's love for learning led him to attend college at Stanford University, majoring in history. While attending school, John also served as a rifleman in the 1st Armored Division. After completing his undergraduate education, John returned to California to join the U.S. Navy. Upon his return to California, John married Mary Lou, a local homemaker. They raised three children: John Jr., Kathy, and Sharon. John enjoyed spending time with

完成了！我们可以使用命令行发送指令，但是我们希望模型使用我们的信息。

索引数据

现在，我们将在 Elastic 中索引一些文档，以便与模型一起使用。

要充分利用语义搜索的强大功能，请确保使用推理端点部署 ELSER 模型：

PUT _inference/sparse_embedding/my-elser-model 
{
  "service": "elser", 
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1
  }
}

如果这是你第一次使用 ELSER，你可能需要等待一段时间。你可以在 Kibana > Machine Learning > Trained Models 中查看部署进度。

提示：如果你还没有部署好自己的 ELSER 模型，那么请详细阅读文章 “Elasticsearch：部署 ELSER - Elastic Learned Sparse EncoderR”。

现在，我们将创建索引，该索引将代表代理可以访问的手机中的数据。

PUT mobile-assistant
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english"
      },
      "description": {
        "type": "text",
        "analyzer": "english", 
        "copy_to": "semantic_field"
      },
      "semantic_field": {
        "type": "semantic_text",
        "inference_id": "my-elser-model"
      }
    }
  }
}

我们使用 copy_to 设置全文搜索和语义搜索的 description 字段。现在，让我们添加文档：

POST _bulk
{ "index" : { "_index" : "mobile-assistant", "_id": "email1"} }
{ "title": "Team Meeting Agenda", "description": "Hello team, Let's discuss our project progress in tomorrow's meeting. Please prepare your updates. Best regards, Manager" }
{ "index" : { "_index" : "mobile-assistant", "_id": "email2"} }
{ "title": "Client Proposal Draft", "description": "Hi, I've attached the draft of our client proposal. Could you review it and provide feedback? Thanks, Colleague" }
{ "index" : { "_index" : "mobile-assistant", "_id": "email3"} }
{ "title": "Weekly Newsletter", "description": "This week in tech: AI advancements, new smartphone releases, and cybersecurity updates. Read more on our website!" }
{ "index" : { "_index" : "mobile-assistant", "_id": "email4"} }
{ "title": "Urgent: Project Deadline Update", "description": "Dear team, Due to recent developments, we need to move up our project deadline. The new submission date is next Friday. Please adjust your schedules accordingly and let me know if you foresee any issues. We'll discuss this in detail during our next team meeting. Best regards, Project Manager" }
{ "index" : { "_index" : "mobile-assistant", "_id": "email5"} }
{ "title": "Invitation: Company Summer Picnic", "description": "Hello everyone, We're excited to announce our annual company summer picnic! It will be held on Saturday, July 15th, at Sunny Park. There will be food, games, and activities for all ages. Please RSVP by replying to this email with the number of guests you'll be bringing. We look forward to seeing you there! Best, HR Team" }

测试模型

现在我们有了数据和模型，我们只需将两者连接起来，模型就可以完成我们需要的工作。

我们首先创建一个函数来构建我们的系统提示。由于这是一个指令（instruct）模型，它不需要对话，而是接收指令并返回结果。

我们将使用聊天模板（chat template ）来格式化提示（prompt）。

def build_prompt(question, elasticsearch_documents):
    docs_text = "\n".join([
        f"Title: {doc['title']}\nDescription: {doc['description']}"
        for doc in elasticsearch_documents
    ])
    
    prompt = f"""<|system|>
    You are Elastic Intelligence (EI), a virtual assistant on a cell phone. Answer questions about emails concisely and accurately. 
    You can only answer based on the context provided by the user.</s>
    <|user|>
    CONTEXT:
    {docs_text}
    
    QUESTION: 
    {question} </s>
    <|assistant|>"""
    
    return prompt
"""

现在，使用语义搜索（semantic search），让我们添加一个根据用户的问题从 Elastic 获取相关文档的功能：

def retrieve_documents(question):
    search_body = {
        "query": {
            "semantic": {
                "query": question,
                "field": "semantic_field"
            }
        }
    }
    response = client.search(index=index_name, body=search_body)
    return [hit["_source"] for hit in response["hits"]["hits"]]

现在，让我们尝试写下：“Summarize my emails”。为了使发送提示更容易，我们将调用文件 generate_openelm.py 中的函数 generate，而不是使用 CLI。

from OpenELM.generate_openelm import generate

output_text, generation_time = generate(
    prompt=prompt,
    model=MODEL,
    hf_access_token=HUGGINGFACE_TOKEN,
)

print("-----GENERATION TIME-----")
print(f'\033[92m {round(generation_time, 2)} \033[0m')
print("-----RESPONSE-----")
print(output_text)