如何使用大语言模型进行事件抽取与关系抽取

诸神缄默不语-个人CSDN博文目录

文章目录

1. 什么是事件抽取与关系抽取？
2. 示例：使用大语言模型进行事件抽取与关系抽取

1. 什么是事件抽取与关系抽取？

事件抽取是指从文本中识别出与某些“事件”相关的信息。这些事件通常包括动作、参与者、时间、地点等关键信息。

关系抽取则是从文本中识别并提取不同事件之间的关联，常见的关系包括因果关系和时序关系等。

2. 示例：使用大语言模型进行事件抽取与关系抽取

在本博文中，我们将通过一个简单的示例来展示如何使用智谱AI进行事件抽取和关系抽取。
对智谱AI的更多使用方式介绍可参考我撰写的另一篇博文：如何调用GLM-4 API实现智能问答

示例代码：

import logging
from zhipuai import ZhipuAI
import csv
import json

# 初始化客户端
client = ZhipuAI(
    api_key="YOUR_ZHIPU_API_KEY"  # 替换为你的API密钥
)

# 初始化日志记录
logging.basicConfig(
    filename=r"event_extraction_logs\process_log.log",  # 日志文件名
    level=logging.INFO,  # 设置日志级别
    format="%(asctime)s - %(levelname)s - %(message)s",  # 日志格式
)

# 用户提示模板：事件抽取 + 关系抽取
user_prompt = """Based on the following example, extract events and their related attributes (主体、客体、触发词、时间、地点) as well as relationships (因果关系、时序关系) from the provided text.

# EVENT EXTRACTION EXAMPLE:
Input text:
"甲公司在2023年5月成功收购了乙公司，导致了双方在市场上的竞争加剧。"

Output:
{
    "事件抽取": [
        {
            "事件1": "甲公司收购乙公司",
            "参与主体": "甲公司",
            "参与客体": "乙公司",
            "触发词": "收购",
            "时间": "2023年5月",
            "地点": "无"
        },
        {
            "事件2": "竞争加剧",
            "参与主体": "甲公司、乙公司",
            "参与客体": "市场",
            "触发词": "加剧",
            "时间": "无",
            "地点": "市场"
        }
    ],
    "关系抽取": [
        {
            "因果关系": {
                "因事件": "甲公司收购乙公司",
                "果事件": "竞争加剧"
            }
        },
        {
            "时序关系": {
                "事件1": "甲公司收购乙公司",
                "事件2": "竞争加剧"
            }
        }
    ]
}

Input text:
"2024年4月，华为公司宣布将进入新能源汽车市场，并计划在未来三年内投资100亿人民币。"
Output:
{
    "事件抽取": [
        {
            "事件1": "华为公司进入新能源汽车市场",
            "参与主体": "华为公司",
            "参与客体": "新能源汽车市场",
            "触发词": "进入",
            "时间": "2024年4月",
            "地点": "新能源汽车市场"
        },
        {
            "事件2": "投资100亿人民币",
            "参与主体": "华为公司",
            "参与客体": "100亿人民币",
            "触发词": "投资",
            "时间": "未来三年",
            "地点": "无"
        }
    ],
    "关系抽取": [
        {
            "因果关系": "无"
        },
        {
            "时序关系": {
                "事件1": "华为公司进入新能源汽车市场",
                "事件2": "投资100亿人民币"
            }
        }
    ]
}

Input text:
"2024年6月，张三开始在甲公司工作，接着他于2024年7月参与了一个重要项目，并在项目结束后的2024年9月晋升为经理。"
Output:
{
    "事件抽取": [
        {
            "事件1": "张三开始在甲公司工作",
            "参与主体": "张三",
            "参与客体": "甲公司",
            "触发词": "开始",
            "时间": "2024年6月",
            "地点": "甲公司"
        },
        {
            "事件2": "张三参与了重要项目",
            "参与主体": "张三",
            "参与客体": "重要项目",
            "触发词": "参与",
            "时间": "2024年7月",
            "地点": "无"
        },
        {
            "事件3": "张三晋升为经理",
            "参与主体": "张三",
            "参与客体": "经理",
            "触发词": "晋升",
            "时间": "2024年9月",
            "地点": "无"
        }
    ],
    "关系抽取": [
        {
            "因果关系": {
                "因事件": "张三开始在甲公司工作",
                "果事件": "张三参与了重要项目"
            }
        },
        {
            "因果关系": {
                "因事件": "张三参与了重要项目",
                "果事件": "张三晋升为经理"
            }
        },
        {
            "时序关系": {
                "前事件": "张三开始在甲公司工作",
                "后事件": "张三参与了重要项目"
            }
        },
        {
            "时序关系": {
                "前事件": "张三参与了重要项目",
                "后事件": "张三晋升为经理"
            }
        }
    ]
}

# Input text:
{specification}

# Output:
"""

# 系统提示
system_prompt = """You are a text information extraction engineer specializing in event extraction and relationship extraction.
Your task is to:
1. Extract events and their attributes: "事件", "参与主体", "参与客体", "触发词", "时间", "地点".
2. Identify relationships between events: "因果关系" and "时序关系".
Return the output as a JSON object with two main sections: "事件抽取" and "关系抽取".
"""

# 定义函数：调用 ZhipuAI 接口进行事件抽取
def extract_events_and_relations(text):
    message = [
        {"role": "user", "content": user_prompt.replace("{specification}", text)},
        {"role": "assistant", "content": system_prompt},
    ]

    try:
        response = client.chat.completions.create(
            model="glm-4", messages=message, temperature=0.2  # 替换为你要使用的模型名称
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        logging.error(f"Error while calling API: {e}")
        return None


# 主逻辑：读取 CSV 文件并进行事件抽取
def process_csv(input_csv_path, output_json_path):
    results = []

    # 读取 CSV 文件
    with open(input_csv_path, mode="r", encoding="utf-8") as csv_file:
        csv_reader = csv.reader(csv_file)
        header = next(csv_reader)  # 跳过表头

        # 遍历每一行文本进行事件抽取
        for row in csv_reader:
            if row:  # 确保该行不为空
                text = row[0]  # 假设文本位于 CSV 的第一列
                logging.info(f"Processing text: {text}")
                extracted_info = extract_events_and_relations(text)
                if extracted_info:
                    try:
                        # 将结果解析为 JSON
                        parsed_info = json.loads(extracted_info)
                        results.append(parsed_info)
                    except json.JSONDecodeError:
                        logging.error(f"Failed to parse JSON for text: {text}")
                        logging.error(f"Response content: {extracted_info}")
                        results.append({"str": extracted_info})

    # 保存结果到 JSON 文件
    with open(output_json_path, mode="w", encoding="utf-8") as json_file:
        json.dump(results, json_file, ensure_ascii=False, indent=4)

    logging.info(f"事件抽取完成，结果已保存到 {output_json_path}")


# 主程序入口
if __name__ == "__main__":
    input_csv_path = (
        r"event_extraction_data\input_data.csv"  # 输入的 CSV 文件路径
    )
    output_json_path = r"event_extraction_output\output_events.json"  # 输出的 JSON 文件路径
    logging.info("Starting process...")
    process_csv(input_csv_path, output_json_path)
    logging.info("任务完成！")