WordPiece词表的创建

文章目录

    • 一、简单介绍
    • 二、步骤流程
      • 2.1 预处理
      • 2.2 计数
      • 2.3 分割
      • 2.4 添加subword
    • 三、代码实现

本篇内容主要介绍如何根据提供的文本内容创建 WordPiece vocabulary,代码来自谷歌;

一、简单介绍

wordpiece的目的是:通过考虑单词内部构造,充分利用subwords的优势,在把长word转化为短word提高文字的灵活性以及提高word转化的效率这两处之间取得一个良好的平衡;

前者会增加词表大小,后者会减少词表大小

二、步骤流程

2.1 预处理

在读取所有的文本内容后,第一步便是对文本内容预处理;

  • 对英文来说,我们可以把字符都转化为小写形式,去掉accents,á 变成 a,然后利用whitespacepunctuation进行分割;也就是空格和标点符号;
  • 对中文来说,我们可以把繁体转化为简体,分割的方式就只有单个单个字词进行分割了,而优化的方式只有从外部引入tokenizer对文本内容做分词,然后进行后续步骤,不然单个中文字词无法进行分解,有人可能想通过偏旁部首来,但偏旁部首如何区分顺序呢?后续的内容将围绕英文展开;

在将文本切割成以word为单位的小块后,我们进行下一步骤;

2.2 计数

在预处理这一阶段,我们得到了以word为单位的小块,为了了解总体words的情况,我们需要对word进行计数处理,并按照数量从大到小排列;如果文本内容很大,我们可以在此做一个优化,过滤掉数量太大或者太小的word以及过滤掉长度太长的word

由于word piece的本质是subwords,为了合理的把word转化为subwords,我们必须考虑word的基本单元;因为如果词表中缺少组成word的基本单元,那么该词的表示就无法实现或者不完整和其他词照成混淆;

所以在这里我们统计所有的词其单个字符出现的次数;同理,在这里我们可以优化一下,删除出现次数较少的字符,由于出现次数较少的字符删除了,哪包含这些字符的words也就无法表示,所以我们同时要删除包含这些字符的words

2.3 分割

在这一环节,我们对计数字典的word进行分割,其处理方式如下:

首先对word设置一个首指针和一个尾指针,以指针之间的内容求匹配计数字典和字符字典的合集,若成功,则将首指针指向尾指针,然后尾指针重新指向最后的位置,若失败,则将尾指针向首指针移动一步;直到停止首尾位置一致,若首尾在尾部则返回output_tokens,若在其他地方则说明不能分词,返回None

实现过程如图所示:

实现代码如下:

def get_split_indices(word, curr_tokens, include_joiner_token, joiner):  
    indices = []  
    start = 0  
    while start < len(word):  
        end = len(word)  
        while end > start:  
            subtoken = word[start:end]  
            # Subtoken includes the joiner token.  
            if include_joiner_token and start > 0:  
                subtoken = joiner + subtoken  
            # If subtoken is part of vocab, 'end' is a valid start index.  
            if subtoken in curr_tokens:  
                indices.append(end)  
                break  
            end -= 1  
        if end == start:  
            return 1  
        start = end  
    return indices  
  
  
if __name__ == '__main__':  
    res = get_split_indices('hello', ['h', '##e', '##llo', '##o'], True, '##')  
    # print(res)  res: [1, 2, 5]

2.4 添加subword

上一步分割的作用实际上是在找最大的分词块,但是其采用的是一种贪婪算法,并不是最优解;在对word进行分割找最大的分割块的indice之后,我们可以更快的找到常常出现在一起的字符串;处理方式如下:获取每一个以indice位置开始,长度依次增加的subword,构建subword字典并计数,其每次增加的数目应该是word.count

这种遍历方式产生的subword的数目过于庞大,因此如果有需要,我们需要对其进行一些优化,比如删掉一些长度较长的subword,删除一些次数比较小的subword,这样添加subword的这一步骤就算完成了;

但是要注意的是,这里的subword出现了重复计数,我们考虑了长的字符串,那么短的字符串一定会被考虑,这里我们从长字符串到短字符开始遍历,当确定长字符串有一定数目确定为vocabulary中的元素时,我们把所有有相同前缀的短字符串减去长字符串的数目避免影响;

与此同时,vocabulary并不一定包含了字符字典,所以我们需要将其进行合并,最后得到的vocabulary就是wordpiece vocabulary

三、代码实现

首先我们对word进行预处理,这里代码省略;

在这里我们传入一个iterable迭代器,然后用collections库中的Counter,对每个词进行计数;

def count_words(iterable) -> collections.Counter:  
    """Converts a iterable of arrays of words into a `Counter` of word counts."""  
    counts = collections.Counter()  
    for words in iterable:  
        # Convert a RaggedTensor to a flat/dense Tensor.  
        words = getattr(words, 'flat_values', words)  
        # Flatten any dense tensor  
        words = np.reshape(words, [-1])  
        counts.update(words)  
  
    # Decode the words if necessary.  
    example_word = next(iter(counts.keys()))  
    if isinstance(example_word, bytes):  
        counts = collections.Counter(  
            {word.decode('utf-8'): count for word, count in counts.items()})  
  
    return counts

根据当前词频以及upper_threshlower_thresh确定词频的界限;

def get_search_threshs(word_counts, upper_thresh, lower_thresh):  
    """Clips the thresholds for binary search based on current word counts.  
  
    The upper threshold parameter typically has a large default value that can    result in many iterations of unnecessary search. Thus we clip the upper and    lower bounds of search to the maximum and the minimum wordcount values.  
    Args:      word_counts: list of (string, int) tuples      upper_thresh: int, upper threshold for binary search      lower_thresh: int, lower threshold for binary search  
    Returns:      upper_search: int, clipped upper threshold for binary search      lower_search: int, clipped lower threshold for binary search    """  
    counts = [count for _, count in word_counts]  
    max_count = max(counts)  
    min_count = min(counts)  
  
    if upper_thresh is None:  
        upper_search = max_count  
    else:  
        upper_search = max_count if max_count < upper_thresh else upper_thresh  
  
    if lower_thresh is None:  
        lower_search = min_count  
    else:  
        lower_search = min_count if min_count > lower_thresh else lower_thresh  
  
    return upper_search, lower_search

对单个的char的数量设置一个上限;

def get_allowed_chars(all_counts, max_unique_chars):  
    """Get the top max_unique_chars characters within our wordcounts.  
  
    We want each character to be in the vocabulary so that we can keep splitting    down to the character level if necessary. However, in order not to inflate    our vocabulary with rare characters, we only keep the top max_unique_chars    characters.  
    Args:      all_counts: list of (string, int) tuples      max_unique_chars: int, maximum number of unique single-character tokens  
    Returns:      set of strings containing top max_unique_chars characters in all_counts    """  
    char_counts = collections.defaultdict(int)  
  
    for word, count in all_counts:  
        for char in word:  
            char_counts[char] += count  
  
    # Sort by count, then alphabetically.  
    sorted_counts = sorted(sorted(char_counts.items(), key=lambda x: x[0]),  
                           key=lambda x: x[1], reverse=True)  
  
    allowed_chars = set()  
    for i in range(min(len(sorted_counts), max_unique_chars)):  
        allowed_chars.add(sorted_counts[i][0])  
    return allowed_chars

结合all_countsallowed_chars,删掉包含allowed_char的字符,控制结果为max_input_tokens个出现次数最大的word

def filter_input_words(all_counts, allowed_chars, max_input_tokens):  
    """Filters out words with unallowed chars and limits words to max_input_tokens.  
  
    Args:      all_counts: list of (string, int) tuples      allowed_chars: list of single-character strings      max_input_tokens: int, maximum number of tokens accepted as input  
    Returns:      list of (string, int) tuples of filtered wordcounts    """    # Ensure that the input is sorted so that if `max_input_tokens` is reached  
    # the least common tokens are dropped.    all_counts = sorted(  
        all_counts, key=lambda word_and_count: word_and_count[1], reverse=True)  
    filtered_counts = []  
    for word, count in all_counts:  
        if (max_input_tokens != -1 and  
                len(filtered_counts) >= max_input_tokens):  
            break  
        has_unallowed_chars = False  
        for char in word:  
            if char not in allowed_chars:  
                has_unallowed_chars = True  
                break        if has_unallowed_chars:  
            continue  
        filtered_counts.append((word, count))  
  
    return filtered_counts

获得splitindex,要保证curr_tokens可以对word进行分割;

def get_split_indices(word, curr_tokens, include_joiner_token, joiner):  
    """Gets indices for valid substrings of word, for iterations > 0.  
  
    For iterations > 0, rather than considering every possible substring, we only    want to consider starting points corresponding to the start of wordpieces in    the current vocabulary.  
    Args:      word: string we want to split into substrings      curr_tokens: string to int dict of tokens in vocab (from previous iteration)      include_joiner_token: bool whether to include joiner token      joiner: string used to indicate suffixes  
    Returns:      list of ints containing valid starting indices for word    """  
    indices = []  
    start = 0  
    while start < len(word):  
        end = len(word)  
        while end > start:  
            subtoken = word[start:end]  
            # Subtoken includes the joiner token.  
            if include_joiner_token and start > 0:  
                subtoken = joiner + subtoken  
            # If subtoken is part of vocab, 'end' is a valid start index.  
            if subtoken in curr_tokens:  
                indices.append(end)  
                break  
            end -= 1  
  
        if end == start:  
            return None  
        start = end  
  
    return indices

进行最后的步骤;

import collections  
from typing import List, Optional  
  
  
Params = collections.namedtuple('Params', [  
    'upper_thresh', 'lower_thresh', 'num_iterations', 'max_input_tokens',  
    'max_token_length', 'max_unique_chars', 'vocab_size', 'slack_ratio',  
    'include_joiner_token', 'joiner', 'reserved_tokens'  
])  
  
  
def extract_char_tokens(word_counts):  
    """Extracts all single-character tokens from word_counts.  
  
    Args:      word_counts: list of (string, int) tuples  
    Returns:      set of single-character strings contained within word_counts    """  
    seen_chars = set()  
    for word, _ in word_counts:  
        for char in word:  
            seen_chars.add(char)  
    return seen_chars  
  
  
def ensure_all_tokens_exist(input_tokens, output_tokens, include_joiner_token,  
                            joiner):  
    """Adds all tokens in input_tokens to output_tokens if not already present.  
  
    Args:      input_tokens: set of strings (tokens) we want to include      output_tokens: string to int dictionary mapping token to count      include_joiner_token: bool whether to include joiner token      joiner: string used to indicate suffixes  
    Returns:      string to int dictionary with all tokens in input_tokens included    """  
    for token in input_tokens:  
        if token not in output_tokens:  
            output_tokens[token] = 1  
  
        if include_joiner_token:  
            joined_token = joiner + token  
            if joined_token not in output_tokens:  
                output_tokens[joined_token] = 1  
  
    return output_tokens  
  
  
def get_search_threshs(word_counts, upper_thresh, lower_thresh):  
    """Clips the thresholds for binary search based on current word counts.  
  
    The upper threshold parameter typically has a large default value that can    result in many iterations of unnecessary search. Thus we clip the upper and    lower bounds of search to the maximum and the minimum wordcount values.  
    Args:      word_counts: list of (string, int) tuples      upper_thresh: int, upper threshold for binary search      lower_thresh: int, lower threshold for binary search  
    Returns:      upper_search: int, clipped upper threshold for binary search      lower_search: int, clipped lower threshold for binary search    """  
    counts = [count for _, count in word_counts]  
    max_count = max(counts)  
    min_count = min(counts)  
  
    if upper_thresh is None:  
        upper_search = max_count  
    else:  
        upper_search = max_count if max_count < upper_thresh else upper_thresh  
  
    if lower_thresh is None:  
        lower_search = min_count  
    else:  
        lower_search = min_count if min_count > lower_thresh else lower_thresh  
  
    return upper_search, lower_search  
  
  
def get_input_words(word_counts, reserved_tokens, max_token_length):  
    """Filters out words that are longer than max_token_length or are reserved.  
  
    Args:      word_counts: list of (string, int) tuples      reserved_tokens: list of strings      max_token_length: int, maximum length of a token  
    Returns:      list of (string, int) tuples of filtered wordcounts    """  
    all_counts = []  
  
    for word, count in word_counts:  
        if len(word) > max_token_length or word in reserved_tokens:  
            continue  
        all_counts.append((word, count))  
  
    return all_counts  
  
  
def generate_final_vocabulary(reserved_tokens, char_tokens, curr_tokens):  
    """Generates final vocab given reserved, single-character, and current tokens.  
  
    Args:      reserved_tokens: list of strings (tokens) that must be included in vocab      char_tokens: set of single-character strings      curr_tokens: string to int dict mapping token to count  
    Returns:      list of strings representing final vocabulary    """  
    sorted_char_tokens = sorted(list(char_tokens))  
    vocab_char_arrays = []  
    vocab_char_arrays.extend(reserved_tokens)  
    vocab_char_arrays.extend(sorted_char_tokens)  
  
    # Sort by count, then alphabetically.  
    sorted_tokens = sorted(sorted(curr_tokens.items(), key=lambda x: x[0]),  
                           key=lambda x: x[1], reverse=True)  
    for token, _ in sorted_tokens:  
        vocab_char_arrays.append(token)  
  
    seen_tokens = set()  
    # Adding unique tokens to list to maintain sorted order.  
    vocab_words = []  
    for word in vocab_char_arrays:  
        if word in seen_tokens:  
            continue  
        seen_tokens.add(word)  
        vocab_words.append(word)  
  
    return vocab_words  
  
  
def learn_with_thresh(word_counts, thresh, params):  
    """Wordpiece learning algorithm to produce a vocab given frequency threshold.  
  
    Args:      word_counts: list of (string, int) tuples      thresh: int, frequency threshold for a token to be included in the vocab      params: Params namedtuple, parameters for learning  
    Returns:      list of strings, vocabulary generated for the given thresh    """  
    # Set of single-character tokens.  
    char_tokens = extract_char_tokens(word_counts)  
    curr_tokens = ensure_all_tokens_exist(char_tokens, {},  
                                          params.include_joiner_token,  
                                          params.joiner)  
  
    for iteration in range(params.num_iterations):  
        subtokens = [dict() for _ in range(params.max_token_length + 1)]  
        # Populate array with counts of each subtoken.  
        for word, count in word_counts:  
            if iteration == 0:  
                split_indices = range(1, len(word) + 1)  
            else:  
                split_indices = get_split_indices(word, curr_tokens,  
                                                  params.include_joiner_token,  
                                                  params.joiner)  
                if not split_indices:  
                    continue  
  
            start = 0  
            for index in split_indices:  
                for end in range(start + 1, len(word) + 1):  
                    subtoken = word[start:end]  
                    length = len(subtoken)  
                    if params.include_joiner_token and start > 0:  
                        subtoken = params.joiner + subtoken  
                    if subtoken in subtokens[length]:  
                        # Subtoken exists, increment count.  
                        subtokens[length][subtoken] += count  
                    else:  
                        # New subtoken, add to dict.  
                        subtokens[length][subtoken] = count  
                start = index  
  
        next_tokens = {}  
        # Get all tokens that have a count above the threshold.  
        for length in range(params.max_token_length, 0, -1):  
            for token, count in subtokens[length].items():  
                if count >= thresh:  
                    next_tokens[token] = count  
                # Decrement the count of all prefixes.  
                if len(token) > length:  # This token includes the joiner.  
                    joiner_len = len(params.joiner)  
                    for i in range(1 + joiner_len, length + joiner_len):  
                        prefix = token[0:i]  
                        if prefix in subtokens[i - joiner_len]:  
                            subtokens[i - joiner_len][prefix] -= count  
                else:  
                    for i in range(1, length):  
                        prefix = token[0:i]  
                        if prefix in subtokens[i]:  
                            subtokens[i][prefix] -= count  
  
        # Add back single-character tokens.  
        curr_tokens = ensure_all_tokens_exist(char_tokens, next_tokens,  
                                              params.include_joiner_token,  
                                              params.joiner)  
  
    vocab_words = generate_final_vocabulary(params.reserved_tokens, char_tokens,  
                                            curr_tokens)  
  
    return vocab_words  
  
  
def learn_binary_search(word_counts, lower, upper, params):  
    """Performs binary search to find wordcount frequency threshold.  
  
    Given upper and lower bounds and a list of (word, count) tuples, performs    binary search to find the threshold closest to producing a vocabulary    of size vocab_size.  
    Args:      word_counts: list of (string, int) tuples      lower: int, lower bound for binary search      upper: int, upper bound for binary search      params: Params namedtuple, parameters for learning  
    Returns:      list of strings, vocab that is closest to target vocab_size    """    thresh = (upper + lower) // 2  
    current_vocab = learn_with_thresh(word_counts, thresh, params)  
    current_vocab_size = len(current_vocab)  
  
    # Allow count to be within k% of the target count, where k is slack ratio.  
    slack_count = params.slack_ratio * params.vocab_size  
    if slack_count < 0:  
        slack_count = 0  
  
    is_within_slack = (current_vocab_size <= params.vocab_size) and (  
            params.vocab_size - current_vocab_size <= slack_count)  
  
    # We've created a vocab within our goal range (or, ran out of search space).  
    if is_within_slack or lower >= upper or thresh <= 1:  
        return current_vocab  
  
    current_vocab = None  
  
    if current_vocab_size > params.vocab_size:  
        return learn_binary_search(word_counts, thresh + 1, upper, params)  
  
    else:  
        return learn_binary_search(word_counts, lower, thresh - 1, params)  

整合:

def learn(word_counts,  
          vocab_size: int,  
          reserved_tokens: List[str],  
          upper_thresh: Optional[int] = int(1e7),  
          lower_thresh: Optional[int] = 10,  
          num_iterations: int = 4,  
          max_input_tokens: Optional[int] = int(5e6),  
          max_token_length: int = 50,  
          max_unique_chars: int = 1000,  
          slack_ratio: float = 0.05,  
          include_joiner_token: bool = True,  
          joiner: str = '##') -> List[str]:  
    """Takes in wordcounts and returns wordpiece vocabulary.  
  
    Args:      word_counts: (word, count) pairs as a dictionary, or list of tuples.      vocab_size: The target vocabulary size. This is the maximum size.      reserved_tokens: A list of tokens that must be included in the vocabulary.      upper_thresh: Initial upper bound on the token frequency threshold.      lower_thresh: Initial lower bound on the token frequency threchold.      num_iterations: Number of iterations to run.      max_input_tokens: The maximum number of words in the initial vocabulary. The        words with the lowest counts are discarded. Use `None` or `-1` for "no        maximum".      max_token_length: The maximum token length. Counts for longer words are        discarded.      max_unique_chars: The maximum alphabet size. This prevents rare characters        from inflating the vocabulary. Counts for words containing characters        ouside of the selected alphabet are discarded.      slack_ratio: The maximum deviation acceptable from `vocab_size` for an        acceptable vocabulary. The acceptable range of vocabulary sizes is from        `vocab_size*(1-slack_ratio)` to `vocab_size`.      include_joiner_token: If true, include the `joiner` token in the output        vocabulary.      joiner: The prefix to include on suffix tokens in the output vocabulary.        Usually "##". For example 'places' could be tokenized as `['place',        '##s']`.  
    Returns:      string, final vocabulary with each word separated by newline    """    if isinstance(word_counts, dict):  
        word_counts = word_counts.items()  
  
    params = Params(upper_thresh, lower_thresh, num_iterations, max_input_tokens,  
                    max_token_length, max_unique_chars, vocab_size, slack_ratio,  
                    include_joiner_token, joiner, reserved_tokens)  
  
    upper_search, lower_search = get_search_threshs(word_counts,  
                                                    params.upper_thresh,  
                                                    params.lower_thresh)  
  
    all_counts = get_input_words(word_counts, params.reserved_tokens,  
                                 params.max_token_length)  
  
    allowed_chars = get_allowed_chars(all_counts, params.max_unique_chars)  
  
    filtered_counts = filter_input_words(all_counts, allowed_chars,  
                                         params.max_input_tokens)  
  
    vocab = learn_binary_search(filtered_counts, lower_search, upper_search,  
                                params)  
  
    return vocab

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/221140.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

算法通关村第十七关-青铜挑战贪心算法思想

大家好我是苏麟 , 今天说说贪心算法 . 贪心思想很难用理论解释&#xff0c;本文我们先通过案例来感受一下贪心是如何解决问题的 大纲 难以理解的贪心算法贪心问题举例分发饼干柠檬水找零分发糖果 难以理解的贪心算法 贪心的思想非常不好解释&#xff0c;而且越使用权威的语言解…

【隐私计算】安全三方计算(3PC)的加法和乘法计算协议

ABY3中采用replicated secret sharing&#xff08;复制秘密分享&#xff09;机制&#xff0c;即2-out-of-3秘密分享&#xff0c;三个参与方的每一方都拥有share中的两份。下面来看一下这样做有什么好处。 2-out-of-3秘密分享 有 x , y x, y x,y两个操作数&#xff0c;先进行秘…

ttkefu在线客服软件新版即将上线——引领客服行业迈向新篇章

在线客服软件已经成为企业与用户之间沟通的重要桥梁。作为领先的客服解决方案提供商&#xff0c;ttkefu即将推出全新版本的在线客服软件&#xff0c;为客服行业注入新的活力。 一、ttkefu新版在线客服软件的主要特点 智能化客户管理&#xff1a;新版软件将采用先进的自然语言…

sd_webui的实用插件,prompt/lama/human matting/...,持续开源更新!!

热烈欢迎大家在git上star&#xff01;&#xff01;&#xff01;冲鸭&#xff01;&#xff01;&#xff01; 1.prompt优化插件 GitHub - leeguandong/sd_webui_beautifulprompt: beautifulprompt extension performs stable diffusion automatic prompt engineering on a bro…

抖音本地生活服务商申请入口在哪里?具体流程是怎样的?

不论是抖音的本地生活业务&#xff0c;还是后来的支付宝、视频号的本地生活业务&#xff0c;因为市场体量足够庞大&#xff0c;市场前景广阔&#xff0c;一直很受各大创业者的追捧。那么&#xff0c;如此火热的本地生活项目&#xff0c;想要申请成为服务商&#xff0c;具体的申…

某60区块链安全之JOP实战二学习记录

区块链安全 文章目录 区块链安全Jump Oriented Programming实战二实验目的掌握对EVM逆向能力实验环境实验工具实验原理实验内容Jump Oriented Programming实战二 实验步骤Jump Oriented Programming实战二 实验目的 学会使用python3的web3模块 学会分析以太坊智能合约中中Jum…

26、卷积 - 实际上是一个特征提取器

矩阵乘法的本质是特征的融合&#xff0c;卷积算法的本质是特征的提取。 回想一下之前所有介绍卷积的时候&#xff0c;描述了一种卷积运算的场景&#xff0c;那就是一个窗口在图片上滑动&#xff0c;窗口中的数值是卷积核的参数&#xff0c;也就是权值。 卷积的计算本质是乘累…

axios使用

Get请求 Post请求 出问题了&#xff1a; 并发请求 全局配置 多个实例如何处理 拦截器 axios在Vue中的模块封装

Install4J安装界面中如何使用脚本找到依赖程序XShell的安装位置

前言 写了一个工具, 使用Install4j打包, 但因为需要用到XShell, 所以希望在安装界面能够提前让用户配置好XShell的安装位置, 所以对Install4j的安装界面需要自定义, 后期在程序中直接过去安装位置就可以正常使用. 调研 和git-bash不一样, 安装版的XShell没有在注册表里存储安…

FL Studio2024破解版本补丁包下载

FL Studio是一款出色的编曲软件&#xff0c;最新版本的FL Studio21新增了四款全新的插件&#xff0c;覆盖了音频设计、延迟、相位器等等。通过软件的不断更新&#xff0c;我们可以享受到更加智能的电子音乐创作工具&#xff0c;目前&#xff0c;FL Studio的正式版已经推出了超过…

产品创新受赞誉,怿星荣获2023未来汽车(电子和软件)创新创业大赛一等奖

2023未来汽车&#xff08;电子和软件&#xff09;创新创业大赛 11月29日&#xff0c;上海临港&#xff0c;由中国汽车工程学会和中国&#xff08;上海&#xff09;自由贸易试验区临港新片区管理委员会联合举办的“2023未来汽车&#xff08;电子和软件&#xff09;创新创业大赛…

NSSCTF 文件上传漏洞题目

目录 [SWPUCTF 2021 新生赛]easyupload1.0 [SWPUCTF 2021 新生赛]easyupload2.0 [SWPUCTF 2021 新生赛]easyupload3.0 [SWPUCTF 2021 新生赛]easyupload1.0 这是一个文件上传漏洞的题目 我们的思路是上传一句话木马&#xff0c;用工具进行连接 先编写一句话木马 将文件后缀…

一位半加法器,一位全加器,四位全加器

我们这里的加法器只考虑一位的情况。 当我们两个一位相加的话&#xff0c;那么就有两个输入&#xff0c;两个输出&#xff0c;两个输入很好理解&#xff0c;就是两个个位上的数字&#xff0c;0或者是1&#xff0c;那么为什么需要有有个输出呢&#xff1f;难道不是输出一个数就…

麒麟linux将图片批量生成PDF的方法

笔者手里有一批国产linu系统&#xff0c;目前开始用在日常的工作生产环境中&#xff0c;我这个老程序猿勉为其难的充当运维的或网管的角色。 国产linux系统常见的为麒麟Linux&#xff0c;统信UOS等&#xff0c;基本都是基于debian再开发的linux。 问题描述&#xff1a; wind…

28、卷积 - 卷积的基础公式

本节推导一下卷积的基础公式,还是先上一张卷积运算的示意图图。 我们知道,一张图片有 3 个维度,分别是长、宽、通道。 这三个维度分别用 3 个字母代替,分别是 H(Height, 对应的是长这一维度), W(Width, 对应的是宽这一维度),C(Channel,对应的是通道这一维度)。 对于…

unity学习笔记19

一、角色动画的使用练习 从资源商店导入的动画资源&#xff08;Character Pack: Free Sample&#xff09;中将资源中的角色创建在场景里&#xff0c;现在场景里存在的角色并没有任何动画。 在资源中找到Animations文件夹&#xff0c;在这个文件有很多模型文件&#xff08;.FBX…

PYthon数据分析学前导语

文章目录 1.学习计划1.1 第一阶段&#xff1a;数据分析阶段1.2 第二阶段&#xff1a;可视化阶段1.3 第三阶段&#xff1a;项目实战阶段 2. 相关工具库的安装2.1.Pandas与Numpy的安装2.2 matplotlib, seaborn, Pyecharts的安装 1.学习计划 欢迎开始Python数据分析系列博客的学习…

Swift 如何实现自定义 Tab Bar

前言 每个 UI 设计师都喜欢美丽而有动画效果的 Tab Bar。然而&#xff0c;对于开发人员来说&#xff0c;实现这种设计可能是一场噩梦。当然&#xff0c;使用 Apple 的原生 Tab Bar 组件并专注于更有趣的事情&#xff0c;比如业务逻辑的实现&#xff0c;会更容易。但如果我们必…

Postman和Apifox针对不同环境、全局变量的使用与比较

文章目录 一、Postman1、配置环境和全局变量2、验证3、存在问题分析 二、Apifox1、配置环境和全局参数2、创建公共脚本3、测试 总结 一、Postman 1、配置环境和全局变量 在Postman的界面中&#xff0c;点击"Environment"&#xff0c;添加我们需要的环境&#xff0c…

QML之动画的使用(含源码+注释)

文章目录 一、动画效果示例图二、个人理解三、源码总结 一、动画效果示例图 下图演示四组动画效果分别包含数值动画&#xff08;单个方块、多个方块&#xff09;&#xff0c;顺序动画&#xff0c;并行动画等效果 二、个人理解 NumberAnimation&#xff1a;改变控件属性值的…