Weaviate

在这里插入图片描述


文章目录

    • 关于 Weaviate
      • 核心功能
      • 部署方式
      • 使用场景
    • 快速上手 (Python)
      • 1、创建 Weaviate 数据库
      • 2、安装
      • 3、连接到 Weaviate
      • 4、定义数据集
      • 5、添加对象
      • 6、查询
        • 1)Semantic search
        • 2) Semantic search with a filter
    • 使用示例
      • Similarity search
      • LLMs and search
      • Classification
      • Other use cases


关于 Weaviate

Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.

  • 官网:https://weaviate.io
  • github : https://github.com/weaviate/weaviate
  • 官方文档:https://weaviate.io/developers/weaviate

核心功能

在这里插入图片描述


部署方式

Multiple deployment options are available to cater for different users and use cases.

All options offer vectorizer and RAG module integration.

在这里插入图片描述


使用场景

Weaviate is flexible and can be used in many contexts and scenarios.

在这里插入图片描述


快速上手 (Python)

参考:https://weaviate.io/developers/weaviate/quickstart


1、创建 Weaviate 数据库

你可以在 Weaviate Cloud Services (WCS). 创建一个免费的 cloud sandbox 实例

方式如:https://weaviate.io/developers/wcs/quickstart

从WCS 的Details tab 拿到 API keyURL


2、安装

使用 v4 client, Weaviate 1.23.7 及以上:

pip install -U weaviate-client

使用 v3

pip install "weaviate-client==3.*"

3、连接到 Weaviate

使用步骤一拿到的 API Key 和 URL,以及 OpenAI 的推理 API Key:https://platform.openai.com/signup


运行以下代码:

V4

import weaviate
import weaviate.classes as wvc
import os
import requests
import json

client = weaviate.connect_to_wcs(
    cluster_url=os.getenv("WCS_CLUSTER_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),
    headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key
    }
)

try:
    pass # Replace with your code. Close client gracefully in the finally block.

finally:
    client.close()  # Close client gracefully

V3

import weaviate
import json

client = weaviate.Client(
    url = "https://some-endpoint.weaviate.network",  # Replace with your endpoint
    auth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"),  # Replace w/ your Weaviate instance API key
    additional_headers = {
        "X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Replace with your inference API key
    }
)

4、定义数据集

Next, we define a data collection (a “class” in Weaviate) to store objects in.

This is analogous to creating a table in relational (SQL) databases.


The following code:

  • Configures a class object with:
    • Name Question
    • Vectorizer module text2vec-openai
    • Generative module generative-openai
  • Then creates the class.

V4

    questions = client.collections.create(
        name="Question",
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
        generative_config=wvc.config.Configure.Generative.openai()  # Ensure the `generative-openai` module is used for generative queries
    )

V3

class_obj = {
    "class": "Question",
    "vectorizer": "text2vec-openai",  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
    "moduleConfig": {
        "text2vec-openai": {},
        "generative-openai": {}  # Ensure the `generative-openai` module is used for generative queries
    }
}

client.schema.create_class(class_obj)

5、添加对象

You can now add objects to Weaviate. You will be using a batch import (read more) process for maximum efficiency.

The guide covers using the vectorizer defined for the class to create a vector embedding for each object.


The above code:

  • Loads objects, and
  • Adds objects to the target class (Question) one by one.

V4

    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
    data = json.loads(resp.text)  # Load data

    question_objs = list()
    for i, d in enumerate(data):
        question_objs.append({
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        })

    questions = client.collections.get("Question")
    questions.data.insert_many(question_objs)  # This uses batching under the hood

V3

import requests
import json
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text)  # Load data

client.batch.configure(batch_size=100)  # Configure batch
with client.batch as batch:  # Initialize a batch process
    for i, d in enumerate(data):  # Batch import data
        print(f"importing question: {i+1}")
        properties = {
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        }
        batch.add_data_object(
            data_object=properties,
            class_name="Question"
        )

6、查询

1)Semantic search

Let’s start with a similarity search. A nearText search looks for objects in Weaviate whose vectors are most similar to the vector for the given input text.

Run the following code to search for objects whose vectors are most similar to that of biology.


V4

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_wcs(
    cluster_url=os.getenv("WCS_CLUSTER_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),
    headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key
    }
)

try:
    pass # Replace with your code. Close client gracefully in the finally block.
    questions = client.collections.get("Question")

    response = questions.query.near_text(
        query="biology",
        limit=2
    )

    print(response.objects[0].properties)  # Inspect the first object

finally:
    client.close()  # Close client gracefully

V3

import weaviate
import json

client = weaviate.Client(
    url = "https://some-endpoint.weaviate.network",  # Replace with your endpoint
    auth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"),  # Replace w/ your Weaviate instance API key
    additional_headers = {
        "X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Replace with your inference API key
    }
)

response = (
    client.query
    .get("Question", ["question", "answer", "category"])
    .with_near_text({"concepts": ["biology"]})
    .with_limit(2)
    .do()
)

print(json.dumps(response, indent=4))

结果如下

{
    "data": {
        "Get": {
            "Question": [
                {
                    "answer": "DNA",
                    "category": "SCIENCE",
                    "question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
                },
                {
                    "answer": "Liver",
                    "category": "SCIENCE",
                    "question": "This organ removes excess glucose from the blood & stores it as glycogen"
                }
            ]
        }
    }
}

2) Semantic search with a filter

You can add Boolean filters to searches. For example, the above search can be modified to only in objects that have a “category” value of “ANIMALS”. Run the following code to see the results:


V4

    questions = client.collections.get("Question")

    response = questions.query.near_text(
        query="biology",
        limit=2,
        filters=wvc.query.Filter.by_property("category").equal("ANIMALS")
    )

    print(response.objects[0].properties)  # Inspect the first object

V3

response = (
    client.query
    .get("Question", ["question", "answer", "category"])
    .with_near_text({"concepts": ["biology"]})
    .with_where({
        "path": ["category"],
        "operator": "Equal",
        "valueText": "ANIMALS"
    })
    .with_limit(2)
    .do()
)

print(json.dumps(response, indent=4))

结果如下:

{
    "data": {
        "Get": {
            "Question": [
                {
                    "answer": "Elephant",
                    "category": "ANIMALS",
                    "question": "It's the only living mammal in the order Proboseidea"
                },
                {
                    "answer": "the nose or snout",
                    "category": "ANIMALS",
                    "question": "The gavial looks very much like a crocodile except for this bodily feature"
                }
            ]
        }
    }
}

更多可见:https://weaviate.io/developers/weaviate/quickstart


使用示例

This page illustrates various use cases for vector databases by way of open-source demo projects. You can fork and modify any of them.

If you would like to contribute your own project to this page, please let us know by creating an issue on GitHub.


Similarity search

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#similarity-search

A vector databases enables fast, efficient similarity searches on and across any modalities, such as text or images, as well as their combinations. Vector database’ similarity search capabilities can be used for other complex use cases, such as recommendation systems in classical machine learning applications.

TitleDescriptionModalityCode
Plant searchSemantic search over plants.TextJavascript
Wine searchSemantic search over wines.TextPython
Book recommender system (Video, Demo)Find book recommendations based on search query.TextTypeScript
Movie recommender system (Blog)Find similar movies.TextJavascript
Multilingual Wikipedia SearchSearch through Wikipedia in multiple languages.TextTypeScript
Podcast searchSemantic search over podcast episodes.TextPython
Video Caption SearchFind the timestamp of the answer to your question in a video.TextPython
Facial RecognitionIdentify people in imagesImagePython
Image Search over dogs (Blog)Find images of similar dog breeds based on uploaded image.ImagePython
Text to image searchFind images most similar to a text query.MultimodalJavascript
Text to image and image to image searchFind images most similar to a text or image query.MultimodalPython

LLMs and search

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#llms-and-search

Vector databases and LLMs go together like cookies and milk!

Vector databases help to address some of large language models (LLMs) limitations, such as hallucinations, by helping to retrieve the relevant information to provide to the LLM as a part of its input.

TitleDescriptionModalityCode
Verba, the golden RAGtriever (Video, Demo)Retrieval-Augmented Generation (RAG) system to chat with Weaviate documentation and blog posts.TextPython
HealthSearch (Blog, Demo)Recommendation system of health products based on symptoms.TextPython
Magic ChatSearch through Magic The Gathering cardsTextPython
AirBnB Listings (Blog)Generation of customized advertisements for AirBnB listings with Generative Feedback LoopsTextPython
DistyllSummarize text or video content.TextPython

Learn more in our LLMs and Search blog post.


Classification

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#classification

Weaviate can leverage its vectorization capabilities to enable automatic, real-time classification of unseen, new concepts based on its semantic understanding.

TitleDescriptionModalityCode
Toxic Comment ClassificationClasify whether a comment is toxic or non-toxic.TextPython
Audio Genre ClassificationClassify the music genre of an audio file.ImagePython

Other use cases

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#other-use-cases

Weaviate’s modular ecosystem unlocks many other use cases of the Weaviate vector database, such as Named Entity Recognition or spell checking.

TitleDescriptionCode
Named Entity Recognition (NER)tbdPython

2024-03-27(三)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/492685.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

金蝶BI方案治好我的数据分析困难症

结构分析、趋势分析、分布分析、对比分析……这还是大方向的,细分下来还会根据数据类型和具体场景不同而不同,不仅如此,每个月的数据分析需求还可能不同,导致分析量多且复杂,加班加点也忙不过来。但金蝶BI方案就不一样…

构造函数与析构函数的显示调用

目录 前言: 构造函数的显示调用 显示调用无参构造 隐式调用无参构造 显示调用有参构造 构造函数的执行顺序 析构函数的显示调用 析构函数的调用顺序 显示调用析构函数 前言: 构造函数是类的特殊成员函数,创建对象时编译器会自动调用…

win10开启了hyper-v,docker 启动还是报错 docker desktop windows hypervisor is not present

问题 在安装了docker windows版本后启动 docker报错docker desktop windows hypervisor is not present 解决措施 首先确认windows功能是否打开Hyper-v 勾选后重启,再次启动 启动后仍报这个错误,是Hyper-v没有设置成功 使用cmd禁用再启用 一.禁用h…

oracle docker安装

修改下载的Image的REPOSITORY和TAG属性 修改下载的Image的REPOSITORY和TAG属性&#xff1a;docker tag <IMAGE ID> <REPOSITORY NAME> docker tag 3fa112fd3642 aliyun/oracle_11g 参考网址 使用docker images时&#xff0c;可能会出现REPOSITORY和TAG均为none的镜…

【JVM】JVM 运行时数据区简介

文章目录 &#x1f334;简介&#x1f332;堆&#xff08;线程共享&#xff09;&#x1f384;本地方法栈&#xff08;线程私有&#xff09;&#x1f333;程序计数器&#xff08;线程私有&#xff09;&#x1f340;方法区&#xff08;线程共享&#xff09;&#x1f338;JDK 1.8 元…

文件的读取与操作

文件类型&#xff1a; 从文件功能的角度来分类&#xff1a; 1.程序⽂件 程序⽂件包括源程序⽂件&#xff08;后缀为.c&#xff09;,⽬标⽂件&#xff08;windows环境后缀为.obj&#xff09;,可执⾏程序&#xff08;windows 环境后缀为.exe&#xff09;。 2. 数据⽂件 ⽂件…

Office办公软件之word的使用(一)

前几天调整公司招标文件的格式&#xff0c;中途遇到一些问题&#xff0c;感觉自己还不是太熟悉操作&#xff0c;通过查阅资料&#xff0c;知道了正确的操作&#xff0c;就想着给记下来。如果再次遇到&#xff0c;也能很快地找到解决办法。 一、怎么把标题前的黑点去掉 解决办法…

latex $$斜体间距太大 解决方案

不要直接$NPSB$&#xff0c; 而是使用$\textit{NPSB}$

Node Sass does not yet support your current environment

项目运行时报错&#xff1a;Node Sass does not yet support your current environment 原因是node版本过高。 解决办法&#xff1a; 使用nvm管理node版本&#xff0c;&#xff08;如何安装nvm&#xff1f;请点击跳转&#xff09; 具体步骤如下&#xff1a; 1.查看当前node…

工业新力军!你不知道的工业电脑触摸一体机

作为普通用户&#xff0c;接触最多的电脑肯定是商用台式电脑、笔记本电脑以及平板电脑等&#xff0c;这类电脑产品面向的均是个人需求。那工业级触摸一体机电脑又是什么&#xff1f;它究竟有何特点能够在工业行业中大放异彩呢&#xff1f; 工业电脑的好处是&#xff1a;1、壳子…

电源设计中的去耦电容深入理解及应用实例,非常实用!

很多新手设计电路&#xff0c;通常会觉得电源的设计很简单&#xff0c;不就是线性电源和开关电源吗&#xff1f;找个参考设计抄一下就行了。。。。。 因此&#xff0c;电源往往是我们在电路设计过程中最容易忽略的环节。相反&#xff0c;电源虽然是设计中非常基础的部分&#x…

Python爬虫如何快速入门

写了几篇网络爬虫的博文后&#xff0c;有网友留言问Python爬虫如何入门&#xff1f;今天就来了解一下什么是爬虫&#xff0c;如何快速的上手Python爬虫。 一、什么是网络爬虫 网络爬虫&#xff0c;英文名称为Web Crawler或Spider&#xff0c;是一种通过程序在互联网上自动获取…

接口测试详解

&#x1f345; 视频学习&#xff1a;文末有免费的配套视频可观看 &#x1f345; 点击文末小卡片 &#xff0c;免费获取软件测试全套资料&#xff0c;资料在手&#xff0c;涨薪更快 1、什么是接口测试 顾名思义&#xff0c;接口测试是对系统或组件之间的接口进行测试&#xff0…

工业级POE交换机的SSH配置步骤

工业级POE交换机的SSH&#xff08;Secure Shell&#xff09;配置可以通过以下步骤进行&#xff1a; 1. 连接到POE交换机&#xff1a;使用一个支持SSH协议的终端工具&#xff08;如PuTTY&#xff09;连接到POE交换机的管理接口。 2. 登录到POE交换机&#xff1a;输入正确的用户…

c++核心学习--继承2

4.6.7多继承语法 4.6.8菱形继承 利用虚继承解决菱形继承的问题&#xff1a;继承之前加上关键字virtual变为虚继承

C++|类封装、类的分文件编写练习:设计立方体类、点和圆的关系

文章目录 练习案例1&#xff1a;设计立方体类CPP代码 练习案例2:点和圆的关系CPP代码 代码总结类的分文件编写 练习案例1&#xff1a;设计立方体类 设计立方体类(Cube) 求出立方体的面积和体积 分别用全局函数和成员函数判断两个立方体是否相等。 CPP代码 class Cube { pub…

【Hello,PyQt】QTextEdit和QSplider

PyQt5 是一个强大的Python库&#xff0c;用于创建图形用户界面&#xff08;GUI&#xff09;。其中&#xff0c;QTextEdit 控件作为一个灵活多用的组件&#xff0c;常用于显示和编辑多行文本内容&#xff0c;支持丰富的格式设置和文本操作功能。另外&#xff0c;QSlider 控件是一…

Mybatis细节详解

上一篇分享了一个Mybatis的快速入门案例&#xff0c;本贴再详细说明几个细节点~ 一.Mapper代理开发 1.定义接口 定义一个接口&#xff0c;并在resources中定义mapper文件夹&#xff0c;文件夹中存放同名的xml配置文件。 2.设置namespace属性 <?xml version"1.0&qu…

信息化平台管理系统智能引擎,互联网企业转型升级的新篇章-亿发

企业管理系统一直在伴随着中国互联网企业的发展而不断进步。过去&#xff0c;企业管理主要依赖于传统的表格和图表记录&#xff0c;但随着互联网企业的崛起&#xff0c;他们开始尝试自己开发简易的管理系统以满足业务需求。随着企业规模和业务复杂度的增加&#xff0c;互联网企…

安达发|印染行业选择APS自动排单软件需要注意什么?

在印染行业中&#xff0c;APS&#xff08;高级计划排程系统&#xff09;自动排单软件的应用可以极大地提升生产效率、减少浪费、优化资源分配&#xff0c;并提高客户满意度。然而&#xff0c;在选择和实施APS自动排单软件时&#xff0c;企业需要注意以下几个关键点&#xff1a;…