RAG原理及本地化实践

基于LLM的应用在问题回答、信息获取上发挥出了巨大作用。这些通用大模型训练的数据主要来源于互联网上的会话或者个别机构提供的数据,虽然能够提供类似人的交互对答,但是在针对某个特定领域的时候就显得不足。通用大模型在应用中主要有以下问题:

  1. 保密性:对于数据安全敏感的用户,并不希望将自己的数据提供出来给通用模型进行训练使用,而通用大模型没有这些数据的输入,就无法提供满足需求的回答。
  2. 模型幻觉:这是大模型的固有不足。典型的“句句有回应,事事没着落”似的答非所问,原因是通用大模型的训练数据太过宽泛,没有聚焦用户期望的特定场景。

是否有一种技术,能够有效提高LLM的准性,让它能够在用户期望的领域内进行信息获取和反馈?答案是有的,这个技术就是RAG。

RAG原理简介

参考资料1中,Patrick Lewis等人提出了一个基于LLM的新架构,并命名为RAG。RAG不是一种新的深度学习网络模型,我更倾向于将它理解为一种基于LLM模型,并组合了其他技术的一种新型框架。

RAG是Retrieval-Augmented Generation的首字母缩写,同时这三个单词也分别表示了这个技术处理的三个步骤,分别的意思是:

  • Retrieval:数据的获取部分。根据用户输入的query数据,再从用户提供的资料中查找出跟query相关的内容。
  • Augmented:数据增强部分。根据Retrieval部分获取的数据对用户的query输入进行一定的修改,修改后的增强输入再提供给LLM。
  • Generation:这是LLM的传统艺能部分了,就是根据前面增强的输入来产生输出。这时候的输出就跟用户的期望领域关联度相当高了。

RAG的架构示意图如下:

图中输入查询(x),获取预测(y)。过程中将q(x)进行嵌入编码(embedding),并通过MIPS计算它和之前嵌入编码保存的d(z)之间的相似度,获取top_k的相关d(z_{i})集合作为输入的增强,然后输入生成器也就是LLM中(P_{\theta }),最终生成预测(y)。

RAG的本地化实践

RAG的技术目前已经有一些支持的三方框架,比如LlamaIndex和LangChain,本文借用github中的simple-local-rag项目,介绍最基础的RAG实现。原文中的代码是基于notebook文档,好处是可以一步步介绍知识点并结合说明代码。但是学习代码时非常不方便直观,所以本文将代码进行了封装重构。读者如果想详细了解每一步的讲解,可以在参考资料2的github项目仓库中点开colab直接学习,如果想大致学习RAG的编码实践,可以继续往下阅读本文。

根据参考资料2的代码,本人改写了一份代码已提交到github(https://github.com/yyaaron/simple-rag-practice)

数据准备

  • 下载pdf文档,pdf是用于增强prompt的内容来源。本文举例的是一本营养学pdf文档,从https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf 下载。FileDownloader.download_if_not_exist方法实现下载到本地当前文件夹下。
  • 切分页面内容,便于后面进一步编码embedding处理。
    • FilePreprocessor.open_and_read_pdf方法加载文档中的每一页内容,并将句子10句一组放到chunk中。
    • FilePreprocessor.pages_and_chunks将这些句子分成指定长度的一个个chunk。为什么要分成chunk?因为模型的输入也是有要求的,这就要求我们在输入前先将内容切片。同时这个方法将过短的内容忽略掉,因为这些基本上是书本里的衔接语句或者页面信息,没有实质内容。
"""
Download file and process it for later use
"""
import os
import random
import re

import requests
import fitz
from tqdm.auto import tqdm
from spacy.lang.en import English


class FileDownloader:
    file_name = 'human-nutrition-text.pdf'
    download_url = 'https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf'

    def __init__(self, file_name: str = 'human-nutrition-text.pdf',
                 download_url: str = 'https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf'):
        self.download_url = download_url
        self.file_name = file_name

    @classmethod
    def download_if_not_exist(cls, file_name: str, download_url: str):
        if file_name == "" or file_name is None:
            file_name = cls.file_name

        if download_url == "" or download_url is None:
            download_url = cls.download_url

        if not os.path.exists(file_name):
            # download file from download_url
            response = requests.get(download_url)
            if response.status_code == 200:
                with open(file_name, "wb") as file:
                    file.write(response.content)

                print("[INFO]File downloaded with name", file_name)
            else:
                print("[INFO]Exception occurs while downloading")
        else:
            print("[INFO]File", file_name, "exists, ignore downloading")


class FilePreprocessor:
    num_sentence_chunk_size = 10

    def __init__(self):
        pass

    @classmethod
    def clean_text(cls, text) -> str:
        return text.replace("\n", " ").strip()

    @classmethod
    def text_to_sentences(cls, text) -> list:
        nlp = English()
        nlp.add_pipe("sentencizer")

        doc = nlp(text)
        return list(doc.sents)

    @classmethod
    def split_list(cls, input_list: list, slice_size: int) -> list[list[str]]:
        return [input_list[i: i + slice_size] for i in range(0, len(input_list), slice_size)]

    @classmethod
    def open_and_read_pdf(cls, file_path: str) -> list[dict]:
        doc = fitz.open(file_path)
        pages_and_texts = []
        print("[INFO]Loading content into memory...")
        for page_number, page in tqdm(enumerate(doc)):
            text = page.get_text()
            text = cls.clean_text(text)
            pages_and_texts.append(
                {"page_number": page_number - 41,
                 "page_char_count": len(text),
                 "page_word_count": len(text.split(" ")),
                 "page_sentence_count_raw": len(text.split(". ")),
                 "page_token_count": len(text) / 4,
                 "text": text
                 })

        print("[INFO]Make sentences into chunks...")
        for item in tqdm(pages_and_texts):
            item["sentences"] = cls.text_to_sentences(item["text"])
            item["sentences"] = [str(sentence) for sentence in item["sentences"]]
            item["page_sentence_count_spacy"] = len(item["sentences"])
            item["sentence_chunks"] = cls.split_list(input_list=item["sentences"],
                                                     slice_size=cls.num_sentence_chunk_size)
            item["num_chunks"] = len(item["sentence_chunks"])

        print(f"[INFO]Finish loading file({file_path})")
        return pages_and_texts

    @classmethod
    def pages_and_chunks(cls, pages_and_texts: list[dict], min_token_len: int = 30) -> list[dict]:
        print(f"[INFO]Divide chunks into specified-size pieces and filter out small chunks(size <= {min_token_len})...")
        pages_and_chunks = []
        for item in tqdm(pages_and_texts):
            for sentence_chunk in item["sentence_chunks"]:
                chunk_dict = {}
                chunk_dict["page_number"] = item["page_number"]

                joined_sentence_chunk = "".join(sentence_chunk).replace("  ", " ").strip()
                # ".A" -> ". A" for any full-stop/capital letter combo
                joined_sentence_chunk = re.sub(r'\.([A-Z])', r'. \1', joined_sentence_chunk)

                chunk_token_count = len(joined_sentence_chunk) / 4  # 1 token = ~4 characters
                if chunk_token_count <= min_token_len:
                    continue  # small token sentences are always headers and footers, so filter them out as are useless
                chunk_dict["chunk_token_count"] = chunk_token_count
                chunk_dict["sentence_chunk"] = joined_sentence_chunk
                chunk_dict["chunk_char_count"] = len(joined_sentence_chunk)
                chunk_dict["chunk_word_count"] = len([word for word in joined_sentence_chunk.split(" ")])

                pages_and_chunks.append(chunk_dict)

        print("[INFO]Finish processing chunks for each pages")

        return pages_and_chunks

数据编码

  • embedding文档中的内容。因为大模型并不是直接从文本中进行计算学习,这就需要一种对文本的数学表示方法。

并不能自己随便创建一个编码方式,比如0表示a、1表示b以此类推,因为这并不能达到算法的有效性。因此编码embedding也是有自己的模型来计算的,文中使用了"all-mpnet-base-v2"。

import time
from typing import Union, List

import torch
from numpy import ndarray
from sentence_transformers import SentenceTransformer
from torch import Tensor


class Embedding:
    model_name = ""
    device = ""
    embedding_model = None

    def __init__(self, model_name="all-mpnet-base-v2", device="cuda" if torch.cuda.is_available() else "cpu"):
        self.model_name = model_name
        self.device = device
        self.embedding_model = SentenceTransformer(model_name_or_path=model_name, device=device)

    def encode(self, sentences: str | list[str], batch_size: int = 32,
               convert_to_tensor: bool = False) -> Union[List[Tensor], ndarray, Tensor]:
        embeddings = self.embedding_model.encode(sentences, batch_size=batch_size,
                                                 convert_to_tensor=convert_to_tensor, device=self.device)

        return embeddings

编码后的文本内容就是一个个数值向量,以不再适合人的阅读,但正是机器所需要的。可以将embedding的内容存储到任何向量数据库,也可以保存在本地csv文件中。本文是用csv文件保存,因为数据量较小。

编码数据获取

与前面一步对应,从csv文件中获取编码内容加载到内存或闪存。

import pandas as pd
import numpy as np
import torch


class EmbeddingLoader:
    device = ""
    file_name = ""

    def __init__(self, device="cuda" if torch.cuda.is_available() else "cpu",
                 file_name="text_chunks_and_embeddings_df.csv"):
        self.device = device
        self.file_name = file_name

    @classmethod
    def load(cls, file_path: str = "text_chunks_and_embeddings_df.csv",
             device="cuda" if torch.cuda.is_available() else "cpu"):
        print(f"[INFO]Load embeddings of texts from file:{file_path}...")
        text_chunks_and_embedding_df = pd.read_csv(file_path)
        text_chunks_and_embedding_df["embedding"] = text_chunks_and_embedding_df["embedding"].apply(
            lambda x: np.fromstring(x.strip("[]"), sep=" "))

        pages_and_chunks = text_chunks_and_embedding_df.to_dict(orient="records")

        embeddings = (torch.tensor(np.array(text_chunks_and_embedding_df["embedding"].tolist()), dtype=torch.float32)
                      .to(device))
        print(f"[INFO]Finish loading embeddings of texts")
        return pages_and_chunks, embeddings

加载LLM

经过前期的数据准备和编码,这里需要选择LLM进行问题答案的生成。

  • LLM的选择,根据你本地机器的cpu或gpu资源能支撑的参数,以及适合的业务场景来选择LLM。本文运行环境是rtx 3060,所以最终选择了gemma-2b-it。
  • 下载LLM,AutoTokenizer.from_pretrained下载tokenizer,AutoModelForCausalLM.from_pretrained下载LLM大模型。本文是直接使用了huggingface的transformers方法,这需要登陆huggingface的模型卡页面获取授权,并在setting页面取得huggingface的token进行login操作(参考这里)。
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.utils import is_flash_attn_2_available
from transformers import BitsAndBytesConfig

from rag_utils import RagUtils


class LlmUtils:
    device = "cuda" if torch.cuda.is_available() else "cpu"

    def __init__(self):
        pass

    @classmethod
    def get_gpu_mem_size(cls) -> int:
        gpu_memory_bytes = torch.cuda.get_device_properties(0).total_memory
        gpu_memory_gb = round(gpu_memory_bytes / (2 ** 30))
        return gpu_memory_gb

    @classmethod
    def mode_selector(cls) -> (bool, str):
        gpu_mem_size = cls.get_gpu_mem_size()

        use_quantization_config = False
        model_id = ""
        if gpu_mem_size < 5.1:
            print(
                f"Your available GPU memory is {gpu_mem_size}GB, you may not able to run Gemma locally without quantization")
        elif gpu_mem_size < 8.1:
            print(f"GPU memory: {gpu_mem_size}GB | Recommended model: Gemma 2B in 4-bit precision")
            use_quantization_config = True
            model_id = "google/gemma-2b-it"
        elif gpu_mem_size < 19.0:
            print(
                f"GPU memory: {gpu_mem_size}GB | Recommended model: Gemma 2B in float16 or Gemma 7B in 4-bit precision")
            use_quantization_config = False
            model_id = "google/gemma-2b-it"
        elif gpu_mem_size >= 19.0:
            print(f"GPU memory: {gpu_mem_size}GB | Recommended model: Gemma 7B in 4-bit or float16 precision")
            use_quantization_config = False
            model_id = "google/gemma-7b-it"

        print(f"[INFO]use_quantization_config set to: {use_quantization_config}\n model_id set to: {model_id}")
        return use_quantization_config, model_id

    @classmethod
    def init_model(cls):
        quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)

        if (is_flash_attn_2_available()) and (torch.cuda.get_device_capability(0)[0] >= 8):
            attn_implementation = "flash_attention_2"
        else:
            attn_implementation = "sdpa"
        print(f"[INFO]Attention implementation set to \"{attn_implementation}\"")

        use_quantization_config, model_id = cls.mode_selector()
        print(f"use_quantization_config: {use_quantization_config}, model_id: {model_id}")

        tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_id)

        llm_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
                                                         torch_dtype=torch.float32,
                                                         quantization_config=quantization_config if use_quantization_config else None,
                                                         low_cpu_mem_usage=False,
                                                         attn_implementation=attn_implementation,
                                                         )

        if not use_quantization_config and torch.cuda.is_available():
            llm_model.to("cuda")

        return tokenizer, llm_model

    @classmethod
    def get_model_num_params(cls, model: torch.nn.Module):
        return sum([param.numel() for param in model.parameters()])

    @classmethod
    def get_model_mem_size(cls, model: torch.nn.Module):
        mem_params = sum([param.nelement() * param.element_size() for param in model.parameters()])
        mem_buffers = sum([buf.nelement() * buf.element_size() for buf in model.buffers()])

        model_mem_bytes = mem_params + mem_buffers
        model_mem_mb = model_mem_bytes / (1024 ** 2)
        model_mem_gb = model_mem_bytes / (1024 ** 3)

        return {"model_mem_bytes": model_mem_bytes,
                "model_mem_mb": round(model_mem_mb, 2),
                "model_mem_gb": round(model_mem_gb, 2)}

    @classmethod
    def gen_prompt_with_context(cls, query: str, context_items: list[dict], tokenizer):
        print(f"input text: \n{query}")

        context = "- " + "\n- ".join([item["sentence_chunk"] for item in context_items])

        base_prompt = """Based on the following context items, please answer the query.
        Give yourself room to think by extracting relevant passages from the context before answering the query.
        Don't return the thinking, only return the answer.
        Make sure your answers are as explanatory as possible.
        Use the following examples as reference for the ideal answer style.
        \nExample 1:
        Query: What are the fat-soluble vitamins?
        Answer: The fat-soluble vitamins include Vitamin A, Vitamin D, Vitamin E, and Vitamin K. These vitamins are absorbed along with fats in the diet and can be stored in the body's fatty tissue and liver for later use. Vitamin A is important for vision, immune function, and skin health. Vitamin D plays a critical role in calcium absorption and bone health. Vitamin E acts as an antioxidant, protecting cells from damage. Vitamin K is essential for blood clotting and bone metabolism.
        \nExample 2:
        Query: What are the causes of type 2 diabetes?
        Answer: Type 2 diabetes is often associated with overnutrition, particularly the overconsumption of calories leading to obesity. Factors include a diet high in refined sugars and saturated fats, which can lead to insulin resistance, a condition where the body's cells do not respond effectively to insulin. Over time, the pancreas cannot produce enough insulin to manage blood sugar levels, resulting in type 2 diabetes. Additionally, excessive caloric intake without sufficient physical activity exacerbates the risk by promoting weight gain and fat accumulation, particularly around the abdomen, further contributing to insulin resistance.
        \nExample 3:
        Query: What is the importance of hydration for physical performance?
        Answer: Hydration is crucial for physical performance because water plays key roles in maintaining blood volume, regulating body temperature, and ensuring the transport of nutrients and oxygen to cells. Adequate hydration is essential for optimal muscle function, endurance, and recovery. Dehydration can lead to decreased performance, fatigue, and increased risk of heat-related illnesses, such as heat stroke. Drinking sufficient water before, during, and after exercise helps ensure peak physical performance and recovery.
        \nNow use the following context items to answer the user query:
        {context}
        \nRelevant passages: <extract relevant passages from the context here>
        User query: {query}
        Answer:"""

        base_prompt = base_prompt.format(context=context, query=query)

        dialogue_template = [
            {"role": "user",
             "content": base_prompt}
        ]

        prompt = tokenizer.apply_chat_template(conversation=dialogue_template,
                                               tokenize=False,
                                               add_generation_prompt=True)
        return prompt

    @classmethod
    def ask(cls, query, embeddings, embedding_model, pages_and_chunks, model, tokenizer,
            temperature=0.7,
            max_new_tokens=512,
            format_answer_text=True,
            return_answer_only=True):
        scores, indices = RagUtils.retrieve_relevant_resources(query=query,
                                                               embeddings=embeddings, model=embedding_model)
        context_items = [pages_and_chunks[i] for i in indices]

        for i, item in enumerate(context_items):
            item["score"] = scores[i].cpu()

        prompt = cls.gen_prompt_with_context(query=query, context_items=context_items, tokenizer=tokenizer)

        input_ids = tokenizer(prompt, return_tensors="pt").to(cls.device)

        outputs = model.generate(**input_ids,
                                 temperature=temperature,
                                 do_sample=True,
                                 max_new_tokens=max_new_tokens)

        output_text = tokenizer.decode(outputs[0])

        if format_answer_text:
            output_text = output_text.replace(prompt, "").replace("<bos>", "").replace("<eos>", "").replace(
                "Sure, here's the answer to the user's query:\n\n", "")

        if return_answer_only:
            return output_text

        return output_text, context_items

提示增强

  • 相似查找,根据用户输入的查询,从pdf内容中去查找相似内容的top_k chunks。RagUtils.retrieve_relevant_resources方法用于计算相似度,在LlmUtils.ask会调用的此方法查找相似上下文。
  • 生成增强提示查询,LlmUtils.gen_prompt_with_context方法融合用户输入的query和根据前一步查找到的top_k相似内容,生成一个增强的包含上下文内容的提示语prompt。

生成答案

LlmUtils.ask方法在生成增强提示后,调用llm的generate方法生成答案。

总入口main.py

下面代码调用前面的各个步骤,生成最终包含本地内容的答案。

import numpy as np
import torch.cuda

import rag_utils
from embedding_loader import EmbeddingLoader
from file_processor import FileDownloader, FilePreprocessor
from embedding import Embedding
from rag_utils import RagUtils
from llm_utils import LlmUtils
import pandas as pd

query = "macronutrients functions"

if __name__ == '__main__':
    # ***********************************
    # STEP 1: File preprocess
    # ***********************************

    # download example PDF file
    file_name = 'human-nutrition-text.pdf'
    download_url = 'https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf'
    FileDownloader().download_if_not_exist(file_name=file_name, download_url=download_url)

    # by default, we use a pdf file named "human-nutrition-text.pdf"
    pages_and_texts = FilePreprocessor.open_and_read_pdf(file_name)

    # turn texts into chunks, and set minimum size of tokens per chunk as 30, as they are always meaningless
    pages_and_trunk_over_min_size = FilePreprocessor.pages_and_chunks(pages_and_texts=pages_and_texts, min_token_len=30)

    # ***********************************
    # STEP 2: Embedding text chunks
    # ***********************************

    # embedding text chunks
    device = "cuda" if torch.cuda.is_available() else "cpu"  # set device "cuda" if gpu available
    embedding_tool = Embedding(model_name="all-mpnet-base-v2", device=device)
    print(f"[INFO]Embedding sentences by model {embedding_tool.model_name}, on device({embedding_tool.device})...")
    for item in pages_and_trunk_over_min_size:
        item["embedding"] = embedding_tool.encode(item["sentence_chunk"], batch_size=32, convert_to_tensor=False)
    print(f"[INFO]Finish embedding.")

    # ***********************************
    # STEP 3: Save embeddings into csv
    # ***********************************

    # embeddings can be saved in any vector database, but here we use csv file for simplicity
    text_chunks_and_embeddings_df = pd.DataFrame(pages_and_trunk_over_min_size)
    embeddings_df_save_path = "text_chunks_and_embeddings_df.csv"
    text_chunks_and_embeddings_df.to_csv(embeddings_df_save_path, index=False)

    # ***********************************
    # STEP 4: Load embeddings from CSV
    # ***********************************

    # text_chunks_and_embedding_df = pd.read_csv(embeddings_df_save_path)
    # text_chunks_and_embedding_df["embedding"] = text_chunks_and_embedding_df["embedding"].apply(
    #                                                                 lambda x: np.fromstring(x.strip("[]"), sep=" "))
    # pages_and_chunks = text_chunks_and_embedding_df.to_dict(orient="records")
    # embeddings = (torch.tensor(np.array(text_chunks_and_embedding_df["embedding"].tolist()), dtype=torch.float32)
    #               .to(device))

    pages_and_chunks, embeddings = EmbeddingLoader.load(file_path=embeddings_df_save_path, device=device)

    # ***********************************
    # STEP 5: Load LLM
    # ***********************************

    # you can select any LLM according to your scenario, here we select google/gemma-2b-it for demo use
    # LLM can be downloaded from huggingface or kaggle.
    # here we download LLM from huggingface, you should have account of it and login to find your token.
    # access token setting page: https://huggingface.co/settings/tokens
    tokenizer, llm = LlmUtils.init_model()

    # ***********************************
    # STEP 6: generate answer
    # ***********************************

    # at last, we ask LLM with query and contexts with the highest relevance.
    # relevance between context and query is computed in RagUtils.retrieve_relevant_resources
    query = "What are the macronutrients, and what roles do they play in the human body?"
    answer = LlmUtils.ask(query=query, embeddings=embeddings, embedding_model=embedding_tool.embedding_model,
                          pages_and_chunks=pages_and_chunks, model=llm, tokenizer=tokenizer,
                          return_answer_only=True)

    print(f"Answer:\n{answer}")

参考资料:

1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

2. github repo: simple-local-rag

3. YouTube: Local Retrieval Augmented Generation (RAG) from Scratch (step by step tutorial)https://www.youtube.com/watch?v=qN_2fnOPY-M

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/576441.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

【DINO】环境配置

1. DINO简介 作为一款基于Transformer性能强劲的计算机视觉算法&#xff0c;一经发布即受追捧&#xff0c;本文记录下在DINO官方代码在集群上的环境配置及训练自己的数据集过程。 DINO原文&#xff1a;https://arxiv.org/abs/2203.03605 DINO源代码&#xff1a;https://github.…

ssm084基于ssm的大型商场会员管理系统+jsp

大型商场会员管理系统的设计与实现 摘 要 进入信息时代以来&#xff0c;很多数据都需要配套软件协助处理&#xff0c;这样可以解决传统方式带来的管理困扰。比如耗时长&#xff0c;成本高&#xff0c;维护数据困难&#xff0c;数据易丢失等缺点。本次使用数据库工具MySQL和编…

【C语言必刷题】7. 百钱百鸡

&#x1f4da;博客主页&#xff1a;爱敲代码的小杨. ✨专栏&#xff1a;《Java SE语法》 | 《数据结构与算法》 | 《C生万物》 |《MySQL探索之旅》 |《Web世界探险家》 ❤️感谢大家点赞&#x1f44d;&#x1f3fb;收藏⭐评论✍&#x1f3fb;&#xff0c;您的三连就是我持续更…

《汇编语言》- 读书笔记 - 综合研究

《汇编语言》- 读书笔记 - 综合研究 研究试验 1 搭建一个精简的 C 语言开发环境1. 下载2. 配置3. 编译4. 连接 研究试验 2 使用寄存器1. 编一个程序 ur1.c &#xff08; tcc 用法&#xff09;tcc 编译连接多个源文件tlink 手动连接 2.用 Debug 加载 ur1.exe&#xff0c;用u命令…

数据转换 | Matlab基于RP递归图一维数据转二维图像方法

目录 效果分析基本介绍程序设计参考资料获取方式 效果分析 基本介绍 Matlab基于RP递归图一维数据转二维图像方法 基于RP&#xff08;Recurrence Plot&#xff09;递归图的方法可以将一维数据转换为二维图像&#xff0c;以可视化数据的动态特征。RP递归图是一种表示时间序列相…

android 去除桌面谷歌搜索框

注&#xff1a; 本文只是博主学习记录分享&#xff0c;仅供参考。如有错误请指出来&#xff0c;谢谢&#xff01; 一、问题描述 去除 android 系统桌面谷歌搜索栏&#xff0c;前后对比如下图&#xff1a; 系统版本&#xff1a;android12 平台&#xff1a;rk3568 二、…

【小浩算法cpp题解】判断环形链表

目录 前言我的思路思路一 &#xff08;哈希表记录链表的访问&#xff09;&#xff1a;思路二 &#xff08;双指针&#xff0c;快指针在前&#xff0c;慢指针在后&#xff09;&#xff1a; 我的代码运行结果 前言 前几天我写的代码&#xff0c;都是把所有的内容写在main函数里&…

Veeam配置备份oracle实例

Veeam是一家专门提供数据管理和数据保护解决方案的软件公司。他们的产品主要包括备份、复制和虚拟化管理等功能&#xff0c;旨在帮助企业保护其数据、应用程序和系统&#xff1b;NBU&#xff0c;COMMVALT&#xff0c;Veeam 国际三大知名备份软件厂商。本文介绍使用Veaam 备份Li…

【nodejs状态库mobx之computed规则】

The above example nicely demonstrates the benefits of a computed value, it acts as a caching point. Even though we change the amount, and this will trigger the total to recompute, it won’t trigger the autorun, as total will detect its output hasn’t been …

行人属性AI识别/人体结构化属性AI识别算法的原理及应用场景介绍

行人属性AI识别技术是一种基于人工智能技术的图像识别技术&#xff0c;通过对行人的图像或视频进行处理和分析&#xff0c;提取出其中的结构化信息&#xff0c;如人体姿态、关键点位置、行人属性&#xff08;性别、年龄、服装等&#xff09;等。 行人结构化数据分析的方法包括…

LORA详解

第一章、lora论文解析 参考论文&#xff1a; low rank adaption of llm 背景介绍&#xff1a; 自然语言处理的一个重要范式包括对一般领域数据的大规模预训练和对特定任务或领域的适应处理。在自然语言处理中的许多应用依赖于将一个大规模的预训练语言模型适配到多个下游应用…

小程序变更主体还要重新备案吗?

小程序迁移变更主体有什么作用&#xff1f;小程序迁移变更主体的作用可不止变更主体这一个哦&#xff01;还可以解决一些历史遗留问题&#xff0c;比如小程序申请时主体不准确&#xff0c;或者主体发生合并、分立或业务调整等情况。这样一来&#xff0c;账号在认证或年审时就不…

五一~感恩回馈,SolidKits工具折扣来袭!

SOLIDWORKS插件多样且丰富&#xff0c;有着不同的种类和用途&#xff0c;可以为SOLIDWORKS软件本身提升使用效率&#xff0c;更快速的响应你的操作方式。SolidKits自主设计研发多款SOLIDWORKS增效插件&#xff0c;包括&#xff1a;自动化参数设计插件、高级BOM插件、批量编码器…

【leetcode面试经典150题】75. 二叉树展开为链表(C++)

【leetcode面试经典150题】专栏系列将为准备暑期实习生以及秋招的同学们提高在面试时的经典面试算法题的思路和想法。本专栏将以一题多解和精简算法思路为主&#xff0c;题解使用C语言。&#xff08;若有使用其他语言的同学也可了解题解思路&#xff0c;本质上语法内容一致&…

Weblogic JMS

简介 全称:WebLogic Server的Java Messaging Service(JMS) WebLogic JMS 是与 WebLogic Server 平台紧密集成的企业级消息传递系统。 Java Message Service (JMS) API 是一种消息传递标准,允许基于 Java Platform Enterprise Edition (Java EE) 的应用程序组件创建、发送、…

基于STC12C5A60S2系列1T 8051单片机正常模式或移位模式控制数码管某位闪烁后单击长按增加或减少数值应用

基于STC12C5A60S2系列1T 8051单片机正常模式或移位模式控制数码管某位闪烁后单击长按增加或减少数值应用 STC12C5A60S2系列1T 8051单片机管脚图STC12C5A60S2系列1T 8051单片机I/O口各种不同工作模式及配置STC12C5A60S2系列1T 8051单片机I/O口各种不同工作模式介绍基于STC12C5A6…

MySQL Workbench 数据库常用操作

大家好哦&#xff0c;我是程序员徐师兄&#xff0c;今天为大家打来的是MySQL Workbench 数据库常用操作。 文章目录 一、连接数据库二、进入数据库三、创建数据库四、设置默认数据库五、创建数据表六、查看表数据七、查看数据表 一、连接数据库 二、进入数据库 三、创建数据库 …

【Leetcode】vector刷题

&#x1f525;个人主页&#xff1a;Quitecoder &#x1f525;专栏&#xff1a;Leetcode刷题 目录 1.只出现一次的数字2.杨辉三角3.删除有序数组中的重复项4.只出现一次的数字II5.只出现一次的数字III6.电话号码的字母组合 1.只出现一次的数字 题目链接&#xff1a;136.只出现一…

vivado 创建和运行链路清扫

创建和运行链路清扫 要分析给定链路的裕度 &#xff0c; 利用不同 MGT 设置来多次运行链路扫描是很有效的。这样有助于判定最佳设置。 Vivado Serial I/O Analyzer 功能支持您定义、运行、保存和重新调用链路清扫 &#xff0c; 链路清扫是由多次链路扫描集合而成的。 每条…

C++之STL-list+模拟实现

目录 一、list的介绍和基本使用的方法 1.1 list的介绍 1.2 list的基本使用方法 1.2.1 构造方法 1.2.2 迭代器 1.2.3 容量相关的接口 1.2.4 增删查改的相关接口 1.3 关于list迭代器失效的问题 二、模拟实现list 2.1 节点类 2.2 迭代器类 2.3 主类list类 2.3.1 成员变…