文档:https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-deploy-model.html
部署刚刚下载好的内置模型
部署内存不够用
还得花钱,拉几把倒吧。就用自带的吧。
测试模型
POST _ml/trained_models/lang_ident_model_1/_infer
{
"docs":[{"text": "The fool doth think he is wise, but the wise man knows himself to be a fool."}]
}
以下是高概率预测英语的结果。
{
"inference_results": [
{
"predicted_value": "en",
"prediction_probability": 0.9999658805366392,
"prediction_score": 0.9999658805366392
}
]
}
创建管道
添加处理器
reference 推理
# Field map
{
"message": "text"
}
# Inference configuration
{
"classification":{
"num_top_classes":5
}
}
set 设置
# field
event.ingested
# value
{{{_ingest.timestamp}}}
失败处理器
测试
[
{
"_source": {
"text_field":"Hello, my name is Josh and I live in Berlin."
}
}
]
[
{
"_source":{
"message":"Sziasztok! Ez egy rövid magyar szöveg. Nézzük, vajon sikerül-e azonosítania a language identification funkciónak? Annak ellenére is sikerülni fog, hogy a szöveg két angol szót is tartalmaz."
}
}
]
测试没问题,创建管道
使用
安装插件
注意版本号与es版本一直,都是8.9.1。安装完会自行重启。
下载mapper-annotated-text安装包
映射索引
注意message字段别写错
PUT ner-test
{
"mappings": {
"properties": {
"ml.inference.predicted_value": {"type": "annotated_text"},
"ml.inference.model_id": {"type": "keyword"},
"message": {"type": "text"},
"event.ingested": {"type": "date"}
}
}
}
索引文档
通过管道 lang_ident_model_1 索引一批文档
POST /_bulk?pipeline=lang_ident_model_1
{"create":{"_index":"ner-test","_id":"1"}}
{"message":"Hello, my name is Josh and I live in Berlin."}
{"create":{"_index":"ner-test","_id":"2"}}
{"message":"I work for Elastic which was founded in Amsterdam."}
{"create":{"_index":"ner-test","_id":"3"}}
{"message":"Elastic has headquarters in Mountain View, California."}
{"create":{"_index":"ner-test","_id":"4"}}
{"message":"Elastic's founder, Shay Banon, created Elasticsearch to solve a simple need: finding recipes!"}
{"create":{"_index":"ner-test","_id":"5"}}
{"message":"Elasticsearch is built using Lucene, an open source search library."}
或者用query
POST lang-test/_doc?pipeline=ner-test
{
"message": "Mon pays ce n'est pas un pays, c'est l'hiver"
}
查看数据
"hits": [
{
"_index": "ner-test",
"_id": "1",
"_score": 1,
"_source": {
"message": "Hello, my name is Josh and I live in Berlin.",
"event": {
"ingested": "2024-04-13T20:31:48.855089336Z"
},
"ml": {
"inference": {
"predicted_value": "en",
"top_classes": [
{
"class_name": "en",
"class_probability": 0.9854748734614491,
"class_score": 0.9854748734614491
},
{
"class_name": "tg",
"class_probability": 0.003855695585908385,
"class_score": 0.003855695585908385
},
{
"class_name": "ig",
"class_probability": 0.0036940515396614113,
"class_score": 0.0036940515396614113
},
{
"class_name": "sw",
"class_probability": 0.0021393582129747924,
"class_score": 0.0021393582129747924
},
{
"class_name": "it",
"class_probability": 0.0011839650697029283,
"class_score": 0.0011839650697029283
}
],
"prediction_probability": 0.9854748734614491,
"prediction_score": 0.9854748734614491,
"model_id": "lang_ident_model_1"
}
}
}
},......
文档重新索引到新目标
POST _reindex
{
"source": {
"index": "ner-test-new",
"size": 50
},
"dest": {
"index": "ner-test",
"pipeline": "lang_ident_model_1"
}
}