作者:来自 Elastic Gustavo Llermaly
如何部署和测试新的 Apple 模型并使用 Elastic 构建 RAG 系统。
在本文中,我们将学习部署和测试新的 Apple 模型,并构建一个 RAG 系统来模拟 Apple Intelligence,使用 Elastic 作为向量数据库,OpenELM 作为模型提供者。
这里有一个包含完整练习的笔记本。
简介
4 月,Apple 发布了其开放高效语言模型 (OpenELM),其参数有 2.7 亿、4.5 亿、1.1 亿和 3 亿,包括聊天(chat)和指令(instruct)版本。参数较大的模型通常更适合执行复杂任务,但速度较慢且耗费更多资源,而参数较小的模型则速度更快、要求更低。选择取决于我们想要解决的问题。
创建者从研究的角度强调了该模型的相关性,他们提供了训练模型所需的一切,并且在某些情况下展示了他们的模型如何以更少的参数获得比竞争对手更高的性能。
这些模型的显著特点是透明性,因为复制它们所需的一切都是开放的,而和那些只提供模型权重和推理代码并在私有数据集上进行预训练的模型则不同。
用于生成和训练这些模型的框架 (CoreNet) 也已可用。
OpenELM 模型的优势之一是它们可以迁移到 MLX,MLX 是一个针对配备 Apple Silicon 处理器的设备优化的深度学习框架,因此它们可以通过为这些设备训练本地模型来从这项技术中受益。
Apple 刚刚发布了新款 iPhone,其中一项新功能是 Apple Intelligence,它利用 AI 来帮助完成通知分类、上下文感知推荐和电子邮件编写等任务。
让我们使用 Elastic 和 OpenELM 构建一个应用程序来实现相同的目标!
应用程序流程如下:
步骤
- 部署模型
- 索引数据
- 测试模型
部署模型
第一步是部署模型。你可以在此处找到有关模型的完整信息:https://huggingface.co/collections/apple/openelm-instruct-models-6619ad295d7ae9f868b759ca
我们将使用指令(instruct)模型,因为我们希望我们的模型遵循指令而不是与其交谈。指令模型是针对一次性请求而不是进行对话进行训练的。
首先,我们需要克隆存储库:
git clone https://huggingface.co/apple/OpenELM
然后,你需要在此处获取 HuggingFace 访问 token。
接下来,你需要请求访问 HuggingFace 中的 Llama-2-7b 模型以使用 OpenELM 分词器。
之后,在刚刚克隆的存储库文件夹中运行以下命令:
python generate_openelm.py --model apple/OpenELM-270M-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10
你应该收到类似这样的回复:
Once upon a time there was a man named John Smith. He had been born in the small town of Pine Bluff, Arkansas, and raised by his single mother, Mary Ann. John's family moved to California when he was young, settling in San Francisco where he attended high school. After graduating from high school, John enlisted in the U.S. Army as a machine gunner. John's first assignment took him to Germany, serving with the 1st Battalion, 12th Infantry Regiment. During this time, John learned German and quickly became fluent in the language. In fact, it took him only two months to learn all 3,000 words of the alphabet. John's love for learning led him to attend college at Stanford University, majoring in history. While attending school, John also served as a rifleman in the 1st Armored Division. After completing his undergraduate education, John returned to California to join the U.S. Navy. Upon his return to California, John married Mary Lou, a local homemaker. They raised three children: John Jr., Kathy, and Sharon. John enjoyed spending time with
完成了!我们可以使用命令行发送指令,但是我们希望模型使用我们的信息。
索引数据
现在,我们将在 Elastic 中索引一些文档,以便与模型一起使用。
要充分利用语义搜索的强大功能,请确保使用推理端点部署 ELSER 模型:
PUT _inference/sparse_embedding/my-elser-model
{
"service": "elser",
"service_settings": {
"num_allocations": 1,
"num_threads": 1
}
}
如果这是你第一次使用 ELSER,你可能需要等待一段时间。你可以在 Kibana > Machine Learning > Trained Models
中查看部署进度。
提示:如果你还没有部署好自己的 ELSER 模型,那么请详细阅读文章 “Elasticsearch:部署 ELSER - Elastic Learned Sparse EncoderR”。
现在,我们将创建索引,该索引将代表代理可以访问的手机中的数据。
PUT mobile-assistant
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "english"
},
"description": {
"type": "text",
"analyzer": "english",
"copy_to": "semantic_field"
},
"semantic_field": {
"type": "semantic_text",
"inference_id": "my-elser-model"
}
}
}
}
我们使用 copy_to 设置全文搜索和语义搜索的 description 字段。现在,让我们添加文档:
POST _bulk
{ "index" : { "_index" : "mobile-assistant", "_id": "email1"} }
{ "title": "Team Meeting Agenda", "description": "Hello team, Let's discuss our project progress in tomorrow's meeting. Please prepare your updates. Best regards, Manager" }
{ "index" : { "_index" : "mobile-assistant", "_id": "email2"} }
{ "title": "Client Proposal Draft", "description": "Hi, I've attached the draft of our client proposal. Could you review it and provide feedback? Thanks, Colleague" }
{ "index" : { "_index" : "mobile-assistant", "_id": "email3"} }
{ "title": "Weekly Newsletter", "description": "This week in tech: AI advancements, new smartphone releases, and cybersecurity updates. Read more on our website!" }
{ "index" : { "_index" : "mobile-assistant", "_id": "email4"} }
{ "title": "Urgent: Project Deadline Update", "description": "Dear team, Due to recent developments, we need to move up our project deadline. The new submission date is next Friday. Please adjust your schedules accordingly and let me know if you foresee any issues. We'll discuss this in detail during our next team meeting. Best regards, Project Manager" }
{ "index" : { "_index" : "mobile-assistant", "_id": "email5"} }
{ "title": "Invitation: Company Summer Picnic", "description": "Hello everyone, We're excited to announce our annual company summer picnic! It will be held on Saturday, July 15th, at Sunny Park. There will be food, games, and activities for all ages. Please RSVP by replying to this email with the number of guests you'll be bringing. We look forward to seeing you there! Best, HR Team" }
测试模型
现在我们有了数据和模型,我们只需将两者连接起来,模型就可以完成我们需要的工作。
我们首先创建一个函数来构建我们的系统提示。由于这是一个指令(instruct)模型,它不需要对话,而是接收指令并返回结果。
我们将使用聊天模板(chat template )来格式化提示(prompt)。
def build_prompt(question, elasticsearch_documents):
docs_text = "\n".join([
f"Title: {doc['title']}\nDescription: {doc['description']}"
for doc in elasticsearch_documents
])
prompt = f"""<|system|>
You are Elastic Intelligence (EI), a virtual assistant on a cell phone. Answer questions about emails concisely and accurately.
You can only answer based on the context provided by the user.</s>
<|user|>
CONTEXT:
{docs_text}
QUESTION:
{question} </s>
<|assistant|>"""
return prompt
"""
现在,使用语义搜索(semantic search),让我们添加一个根据用户的问题从 Elastic 获取相关文档的功能:
def retrieve_documents(question):
search_body = {
"query": {
"semantic": {
"query": question,
"field": "semantic_field"
}
}
}
response = client.search(index=index_name, body=search_body)
return [hit["_source"] for hit in response["hits"]["hits"]]
现在,让我们尝试写下:“Summarize my emails”。为了使发送提示更容易,我们将调用文件 generate_openelm.py 中的函数 generate,而不是使用 CLI。
from OpenELM.generate_openelm import generate
output_text, generation_time = generate(
prompt=prompt,
model=MODEL,
hf_access_token=HUGGINGFACE_TOKEN,
)
print("-----GENERATION TIME-----")
print(f'\033[92m {round(generation_time, 2)} \033[0m')
print("-----RESPONSE-----")
print(output_text)
第一个答案各不相同,而且不太好。在某些情况下,我们得到了正确的答案,但在其他情况下则没有。该模型返回了有关其推理、HTML 代码或未在上下文中提及的人的详细信息。
如果我们将问题限制为是/否答案,则模型的表现会更好。这是有道理的,因为它是一个小模型,思考能力较弱。
现在,让我们尝试一个分类任务:
我们看到它只需要一个小循环,但模型能够正确地对电子邮件进行分类。这使得该模型对于按主题或相关性对电子邮件或通知进行分类等任务很有吸引力。另一件需要注意的重要事情是这种模型对提示变化的敏感程度。任务描述方式等小细节可能会使答案有很大差异。
尝试各种不同的 prompt,直到获得所需的结果。
结论
尽管 OpenLM 模型并不试图在业务层面上竞争,但它们在实验场景中提供了一种有趣的替代方案,因为它们公开提供了完整的训练流程,并且具有高度可定制的框架,可用于你自己的数据。它们是需要离线、定制和高效模型的开发人员的理想选择。
结果可能不如其他模型那么令人印象深刻,但从头开始训练此模型的选项非常有吸引力。此外,使用 CoreNet 将其迁移到 Apple Silicon 的机会为创建针对 Apple 设备的优化本地模型打开了大门。如果你对如何将 Open ELM 迁移到 Silico 处理器感兴趣,请查看此 repo。
Elasticsearch 包含许多新功能,可帮助你为你的用例构建最佳搜索解决方案。深入了解我们的示例笔记本以了解更多信息,开始免费云试用,或立即在你的本地机器上试用 Elastic。
原文:Using Elastic and Apple's OpenELM models for RAG systems - Search Labs