Dataset for Stable Diffusion

1.Dataset for Stable Diffusion

笔记来源:
1.Flickr8k数据集处理
2.处理Flickr8k数据集
3.Github:pytorch-stable-diffusion
4.Flickr 8k Dataset
5.dataset_flickr8k.json
6.About Train, Validation and Test Sets in Machine Learning Tarang Shah Towards Data Science
7.What are hyperparameters?

1.1 Dataset

采用Flicker8k数据集,该数据集有两个文件,第一个文件为Flicker8k_Dataset (全部为图片),第二个文件为Flickr8k.token.txt (含两列image_id和caption),其中一个image_id对应5个caption (sentence)

1.2 Dataset description file

数据集文本描述文件:dataset_flickr8k.json
文件格式如下:
{“images”: [ {“sentids”: [ ],“imgid”: 0,“sentences”:[{“tokens”:[ ]}, {“tokens”:[ ], “raw”: “…”, “imgid”:0, “sentid”:0}, …, “split”: “train”, “filename”: …jpg}, {“sentids”…} ], “dataset”: “flickr8k”}

参数解释
“sentids”:[0,1,2,3,4]caption 的 id 范围(一个image对应5个caption,所以sentids从0到4)
“imgid”:0image 的 id(从0到7999共8000张image)
“sentences”:[ ]包含一张照片的5个caption
“tokens”:[ ]每个caption分割为单个word
“raw”: " "每个token连接起来的caption
“imgid”: 0与caption相匹配的image的id
“sentid”: 0imag0对应的具体的caption的id
“split”:" "将该image和对应caption划分到训练集or验证集or测试集
“filename”:“…jpg”image具体名称

dataset_flickr8k.json

1.3 Process Datasets

下面代码摘自:Flickr8k数据集处理(仅作学习使用)

import json
import os
import random
from collections import Counter, defaultdict
from matplotlib import pyplot as plt
from PIL import Image
from argparse import Namespace
import numpy as np
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence
from torch.utils.data import Dataset
import torchvision
import torchvision.transforms as transforms


def create_dataset(dataset='flickr8k', captions_per_image=5, min_word_count=5, max_len=30):
    """
    Parameters:
        dataset: Name of the dataset
        captions_per_image: Number of captions per image
        min_word_count: Only consider words that appear at least this many times in the dataset (excluding the test set)
        max_len: Maximum number of words in a caption. Captions longer than this will be truncated.
    Output:
        A vocabulary file: vocab.json
        Three dataset files: train_data.json, val_data.json, test_data.json
    """
    # Paths for reading data and saving processed data
    # Path to the dataset JSON file
    flickr_json_path = ".../sd/data/dataset_flickr8k.json"
    # Folder containing images
    image_folder = ".../sd/data/Flicker8k_Dataset"
    # Folder to save processed results
    # The % operator is used to format the string by replacing %s with the value of the dataset variable.
    # For example, if dataset is "flickr8k", the resulting output_folder will be
    # /home/wxy/Documents/PycharmProjects/pytorch-stable-diffusion/sd/data/flickr8k.
    output_folder = ".../sd/data/%s" % dataset

    # Ensure output directory exists
    os.makedirs(output_folder, exist_ok=True)
    print(f"Output folder: {output_folder}")

    # Read the dataset JSON file
    with open(file=flickr_json_path, mode="r") as j:
        data = json.load(fp=j)
    # Initialize containers for image paths, captions, and vocabulary
    # Dictionary to store image paths
    image_paths = defaultdict(list)
    # Dictionary to store image captions
    image_captions = defaultdict(list)
    # Count the number of elements, then count and return a dictionary
    # key:element value:the number of elements.
    vocab = Counter()
    # read from file dataset_flickr8k.json
    for img in data["images"]:  # Iterate over each image in the dataset
        split = img["split"]  # Determine the split (train, val, or test) for the image
        captions = []
        for c in img["sentences"]:  # Iterate over each caption for the image
            # Update word frequency count, excluding test set data
            if split != "test":  # Only update vocabulary for train/val splits
                # c['tokens'] is a list, The number of occurrences of each word in the list is increased by one
                vocab.update(c['tokens'])  # Update vocabulary with words in the caption
            # Only consider captions that are within the maximum length
            if len(c["tokens"]) <= max_len:
                captions.append(c["tokens"])  # Add the caption to the list if it meets the length requirement

        if len(captions) == 0:  # Skip images with no valid captions
            continue
        # Construct the full image path/home/wxy/Documents/PycharmProjects/pytorch-stable-diffusion
        # image_folder + image_name
        # ./Flicker8k_Dataset/img['filename']
        path = os.path.join(image_folder, img['filename'])
        # Save the full image path and its captions in the respective dictionaries
        image_paths[split].append(path)
        image_captions[split].append(captions)

    '''
    After the above steps, we have:
    - vocab(a dict) keys:words、values: counts of all words
    - image_paths: (a dict) keys "train", "val", and "test"; values: lists of absolute image paths
    - image_captions: (a dict) keys: "train", "val", and "test"; values: lists of captions
    '''/home/wxy/Documents/PycharmProjects/pytorch-stable-diffusion
		....
		....

我们通过dataset_flickr8k.json文件把数据集转化为三个词典

dictkeyvalue
vacabwordfrequency of words in all captions
image_path“train”、“val”、“test”lists of absolute image path
image_captions“train”、“val”、“test”lists of captions

我们通过Debug打印其中的内容

print(vocab)
print(image_paths["train"][1])
print(image_captions["train"][1])

def create_dataset(dataset='flickr8k', captions_per_image=5, min_word_count=5, max_len=30):
    """
    Parameters:
        dataset: Name of the dataset
        captions_per_image: Number of captions per image
        min_word_count: Only consider words that appear at least this many times in the dataset (excluding the test set)
        max_len: Maximum number of words in a caption. Captions longer than this will be truncated.
    Output:
        A vocabulary file: vocab.json
        Three dataset files: train_data.json, val_data.json, test_data.json
    """
    ....
    ....
    # Create the vocabulary, adding placeholders for special tokens
    # Add placeholders<pad>, unregistered word identifiers<unk>, sentence beginning and end identifiers<start><end>
    words = [w for w in vocab.keys() if vocab[w] > min_word_count]  # Filter words by minimum count
    vocab = {k: v + 1 for v, k in enumerate(words)}  # Create the vocabulary with indices
    # Add special tokens to the vocabulary
    vocab['<pad>'] = 0
    vocab['<unk>'] = len(vocab)
    vocab['<start>'] = len(vocab)
    vocab['<end>'] = len(vocab)

    # Save the vocabulary to a file
    with open(os.path.join(output_folder, 'vocab.json'), "w") as fw:
        json.dump(vocab, fw)

    # Process each dataset split (train, val, test)
    # Iterate over each split: split = "train" 、 split = "val" 和 split = "test"
    for split in image_paths:
        # List of image paths for the split
        imgpaths = image_paths[split]  # type(imgpaths)=list
        # List of captions for the split
        imcaps = image_captions[split]  # type(imcaps)=list
        # store result that converting words of caption to their respective indices in the vocabulary
        enc_captions = []

        for i, path in enumerate(imgpaths):
            # Check if the image can be opened
            img = Image.open(path)
            # Ensure each image has the required number of captions
            if len(imcaps[i]) < captions_per_image:
                filled_num = captions_per_image - len(imcaps[i])
                # Repeat captions if needed
                captions = imcaps[i] + [random.choice(imcaps[i]) for _ in range(0, filled_num)]
            else:
                # Randomly sample captions if there are more than needed
                captions = random.sample(imcaps[i], k=captions_per_image)

            assert len(captions) == captions_per_image

            for j, c in enumerate(captions):
                # Encode each caption by converting words to their respective indices in the vocabulary
                enc_c = [vocab['<start>']] + [vocab.get(word, vocab['<unk>']) for word in c] + [vocab["<end>"]]
                enc_captions.append(enc_c)

        assert len(imgpaths) * captions_per_image == len(enc_captions)

        data = {"IMAGES": imgpaths,
                "CAPTIONS": enc_captions}

        # Save the processed dataset for the current split (train,val,test)
        with open(os.path.join(output_folder, split + "_data.json"), 'w') as fw:
            json.dump(data, fw)


create_dataset()

经过create_dataset函数,我们得到如下图的文件

四个文件的详细内容见下表

train_data.json中的第一个key:IMAGES
train_data.json中的第二个key:CAPTIONS
test_data.json中的第一个key:IMAGES
test_data.json中的第二个key:CAPTIONS
val_data.json中的第一个key:IMAGES
val_data.json中的第二个key:CAPTIONS
vocab.json开始部分
vocab.json结尾部分

生成vocab.json的关键代码
首先统计所有caption中word出现至少大于5次的word,而后给这些word依次赋予一个下标

# Create the vocabulary, adding placeholders for special tokens
    # Add placeholders<pad>, unregistered word identifiers<unk>, sentence beginning and end identifiers<start><end>
    # Create a list of words from the vocabulary that have a frequency higher than 'min_word_count'
    # min_word_count: Only consider words that appear at least this many times in the dataset (excluding the test set)
    words = [w for w in vocab.keys() if vocab[w] > min_word_count]  # Filter words by minimum count
    # assign an index to each word, starting from 1 (indices start from 0, so add 1)
    vocab = {k: v + 1 for v, k in enumerate(words)}  # Create the vocabulary with indices

最终生成vocab.json

生成 [“split”]_data.json 的关键
读入文件dataset_flickr8k.json,并创建两个字典,第一个字典放置每张image的绝对路径,第二个字典放置描述image的caption,根据vocab将token换为下标保存,根据文件dataset_flickr8k.json中不同的split,这image的绝对路径和相应caption保存在不同文件中(train_data.json、test_data.json、val_data.json)

dataset_flickr8k.json

train_data.json


从vocab中获取token的下标得到CAPTION的编码

for j, c in enumerate(captions):
  # Encode each caption by converting words to their respective indices in the vocabulary
  enc_c = [vocab['<start>']] + [vocab.get(word, vocab['<unk>']) for word in c] + [vocab["<end>"]]
  enc_captions.append(enc_c)


尝试使用上面生成的测试集文件test_data.json和vocab.json输出某张image以及对应的caption
下面代码摘自:Flickr8k数据集处理(仅作学习使用)

'''
test
1.Iterates over the 5 captions for 下面代码引用自:[Flickr8k数据集处理](https://blog.csdn.net/weixin_48981284/article/details/134676813)(仅作学习使用)the 250th image.
2.Retrieves the word indices for each caption.
3.Converts the word indices to words using vocab_idx2word.
4.Joins the words to form complete sentences.
5.Prints each caption.
'''
import json
from PIL import Image
from matplotlib import pyplot as plt
# Load the vocabulary from the JSON file
with open('.../sd/data/flickr8k/vocab.json', 'r') as f:
    vocab = json.load(f)  # Load the vocabulary from the JSON file into a dictionary
# Create a dictionary to map indices to words
vocab_idx2word = {idx: word for word, idx in vocab.items()}
# Load the test data from the JSON file
with open('.../sd/data/flickr8k/test_data.json', 'r') as f:
    data = json.load(f)  # Load the test data from the JSON file into a dictionary
# Open and display the 250th image in the test set
# Open the image at index 250 in the 'IMAGES' list
content_img = Image.open(data['IMAGES'][250])
plt.figure(figsize=(6, 6))
plt.subplot(1,1,1)
plt.imshow(content_img)
plt.title('Image')
plt.axis('off')
plt.show()
# Print the lengths of the data, image list, and caption list
# Print the number of keys in the dataset dictionary (should be 2: 'IMAGES' and 'CAPTIONS')
print(len(data))
print(len(data['IMAGES']))  # Print the number of images in the 'IMAGES' list
print(len(data["CAPTIONS"]))  # Print the number of captions in the 'CAPTIONS' list
# Display the captions for the 300th image
# Iterate over the 5 captions associated with the 300th image
for i in range(5):
    # Get the word indices for the i-th caption of the 300th image
    word_indices = data['CAPTIONS'][250 * 5 + i]
    # Convert indices to words and join them to form a caption
    print(''.join([vocab_idx2word[idx] for idx in word_indices]))

data 的 key 有两个 IMAGES 和 CAPTIONS
测试集image有1000张,每张对应5个caption,共5000个caption
第250张图片的5个caption如下图

1.4 Dataloader

下面代码摘自:Flickr8k数据集处理(仅作学习使用)

import json
import os
import random
from collections import Counter, defaultdict
from PIL import Image
import torch
from torch.utils.data import Dataset
from torch.utils import data
import torchvision.transforms as transforms


class ImageTextDataset(Dataset):
    """
    Pytorch Dataset class to generate data batches using torch DataLoader
    """
    def __init__(self, dataset_path, vocab_path, split, captions_per_image=5, max_len=30, transform=None):
        """
        Parameters:
            dataset_path: Path to the JSON file containing the dataset
            vocab_path: Path to the JSON file containing the vocabulary
            split: The dataset split, which can be "train", "val", or "test"
            captions_per_image: Number of captions per image
            max_len: Maximum number of words per caption
            transform: Image transformation methods
        """
        self.split = split
        # Validate that the split is one of the allowed values
        assert self.split in {"train", "val", "test"}
        # Store captions per image
        self.cpi = captions_per_image
        # Store maximum caption length
        self.max_len = max_len

        # Load the dataset
        with open(dataset_path, "r") as f:
            self.data = json.load(f)

        # Load the vocabulary
        with open(vocab_path, "r") as f:
            self.vocab = json.load(f)

        # Store the image transformation methods
        self.transform = transform

        # Number of captions in the dataset
        # Calculate the size of the dataset
        self.dataset_size = len(self.data["CAPTIONS"])

    def __getitem__(self, i):
        """
            Retrieve the i-th sample from the dataset
        """
        # Get [i // self.cpi]-th image corresponding to the i-th sample (each image has multiple captions)
        img = Image.open(self.data['IMAGES'][i // self.cpi]).convert("RGB")
        # Apply image transformation if provided
        if self.transform is not None:
            # Apply the transformation to the image
            img = self.transform(img)
        # Get the length of the caption
        caplen = len(self.data["CAPTIONS"][i])
        # Pad the caption if its length is less than max_len
        pad_caps = [self.vocab['<pad>']] * (self.max_len + 2 - caplen)
        # Convert the caption to a tensor and pad it
        caption = torch.LongTensor(self.data["CAPTIONS"][i] + pad_caps)
        return img, caption, caplen  # Return the image, caption, and caption length

    def __len__(self):
        return self.dataset_size  # Number of samples in the dataset


def make_train_val(data_dir, vocab_path, batch_size, workers=4):
    """
        Create DataLoader objects for training, validation, and testing sets.
        Parameters:
            data_dir: Directory where the dataset JSON files are located
            vocab_path: Path to the vocabulary JSON file
            batch_size: Number of samples per batch
            workers: Number of subprocesses to use for data loading (default is 4)
        Returns:
            train_loader: DataLoader for the training set
            val_loader: DataLoader for the validation set
            test_loader: DataLoader for the test set
    """
    # Define transformation for training set
    train_tx = transforms.Compose([
        transforms.Resize(256),  # Resize images to 256x256
        transforms.ToTensor(),  # Convert image to PyTorch tensor
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize using ImageNet mean and std
    ])

    val_tx = transforms.Compose([
        transforms.Resize(256),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    # Create dataset objects for training, validation, and test sets
    train_set = ImageTextDataset(dataset_path=os.path.join(data_dir, "train_data.json"), vocab_path=vocab_path,
                                 split="train", transform=train_tx)

    vaild_set = ImageTextDataset(dataset_path=os.path.join(data_dir, "val_data.json"), vocab_path=vocab_path,
                                 split="val", transform=val_tx)

    test_set = ImageTextDataset(dataset_path=os.path.join(data_dir, "test_data.json"), vocab_path=vocab_path,
                                split="test", transform=val_tx)
    # Create DataLoader for training set with data shuffling
    train_loder = data.DataLoader(
        dataset=train_set, batch_size=batch_size, shuffer=True,
        num_workers=workers, pin_memory=True
    )
    # Create DataLoader for validation set without data shuffling
    val_loder = data.DataLoader(
        dataset=vaild_set, batch_size=batch_size, shuffer=False,
        num_workers=workers, pin_memory=True, drop_last=False
    )
    # Create DataLoader for test set without data shuffling
    test_loder = data.DataLoader(
        dataset=test_set, batch_size=batch_size, shuffer=False,
        num_workers=workers, pin_memory=True, drop_last=False
    )

    return train_loder, val_loder, test_loder

创建好train_loader后,接下来我们就可以着手开始训练SD了!

1.5 Training、Validation、Test Set

了解训练集、测试集、验证集的作用

训练集
用于模型训练阶段

Training Dataset: The sample of data used to fit the model.

The actual dataset that we use to train the model (weights and biases in the case of a Neural Network). The model sees and learns from this data.

验证集
用于模型调参阶段

Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.

The validation set is used to evaluate a given model, but this is for frequent evaluation. We, as machine learning engineers, use this data to fine-tune the model hyperparameters. Hence the model occasionally sees this data, but never does it “Learn” from this. We use the validation set results, and update higher level hyperparameters. So the validation set affects a model, but only indirectly. The validation set is also known as the Dev set or the Development set. This makes sense since this dataset helps during the “development” stage of the model.

测试集
在模型训练且调参阶段完成后测试模型性能

Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

The Test dataset provides the gold standard used to evaluate the model. It is only used once a model is completely trained(using the train and validation sets). The test set is generally what is used to evaluate competing models (For example on many Kaggle competitions, the validation set is released initially along with the training set and the actual test set is only released when the competition is about to close, and it is the result of the the model on the Test set that decides the winner). Many a times the validation set is used as the test set, but it is not good practice. The test set is generally well curated. It contains carefully sampled data that spans the various classes that the model would face, when used in the real world.

数据集划分比例
Now that you know what these datasets do, you might be looking for recommendations on how to split your dataset into Train, Validation and Test sets.

This mainly depends on 2 things. First, the total number of samples in your data and second, on the actual model you are training.

Some models need substantial data to train upon, so in this case you would optimize for the larger training sets. Models with very few hyperparameters will be easy to validate and tune, so you can probably reduce the size of your validation set, but if your model has many hyperparameters, you would want to have a large validation set as well(although you should also consider cross validation). Also, if you happen to have a model with no hyperparameters or ones that cannot be easily tuned, you probably don’t need a validation set too!

All in all, like many other things in machine learning, the train-test-validation split ratio is also quite specific to your use case and it gets easier to make judge ment as you train and build more and more models.

What are hyperparameters?
Hyperparameters are external configuration variables that data scientists use to manage machine learning model training. Sometimes called model hyperparameters, the hyperparameters are manually set before training a model. They’re different from parameters, which are internal parameters automatically derived during the learning process and not set by data scientists.
Examples of hyperparameters include the number of nodes and layers in a neural network and the number of branches in a decision tree. Hyperparameters determine key features such as model architecture, learning rate, and model complexity.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/797916.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

提升机器视觉与机器学习软件安全性的实践策略

在近几年科技爆发中&#xff0c;机器学习&#xff08;ML&#xff09;和机器视觉&#xff08;MV&#xff09;的结合正在改变各行各业。机器学习通过数据驱动的算法让计算机能够自我学习&#xff0c;而机器视觉赋予计算机识别和理解图像的能力。这种结合使得计算机可以高效地执行…

fastadmin后台无法删除文件,如何解决?

&#x1f3c6;本文收录于《CSDN问答解答》专栏&#xff0c;主要记录项目实战过程中的Bug之前因后果及提供真实有效的解决方案&#xff0c;希望能够助你一臂之力&#xff0c;帮你早日登顶实现财富自由&#x1f680;&#xff1b;同时&#xff0c;欢迎大家关注&&收藏&…

牛客小白月赛98 (个人题解)(补全)

前言&#xff1a; 昨天晚上自己一个人打的小白月赛&#xff08;因为准备数学期末已经写烦了&#xff09;&#xff0c;题目难度感觉越来越简单了&#xff08;不在像以前一样根本写不了一点&#xff0c;现在看题解已经能看懂一点了&#xff09;&#xff0c;能感受到自己在不断进步…

智算网络谜题,与“解密者”新华三

根据高盛研究公司&#xff08;GSR&#xff09;数据报告显示&#xff0c;AIGC将推动全球国民生产总值&#xff08;GDP&#xff09;增长7%&#xff0c;带来近7万亿美元的GDP增长&#xff0c;并在未来使生产力提高1.5%。面对如此巨大的价值涌现&#xff0c;每个行业、每家企业都希…

JAVASE进阶day07(泛型,集合,Set,TreeSet,枚举,数据结构)

泛型 1.泛型的基本使用 限制集合存储的数据类型 package com.lu.day07.generics;/*** 定义了一个泛型类* E 泛型通配字母(不固定代替真实数据类型A-Z都可以)* 常见的泛型通配字母:* E:element 元素* T:type 类型* R:return 返回值类型* K:key 键* …

CV09_深度学习模块之间的缝合教学(4)--调参

深度学习就像炼丹。炉子就是模型&#xff0c;火候就是那些参数&#xff0c;材料就是数据集。 1.1 参数有哪些 调参调参&#xff0c;参数到底是哪些参数&#xff1f; 1.网络相关的参数&#xff1a;&#xff08;1&#xff09;神经网络网络层 &#xff08;2&#xff09;隐藏层…

SvANet:微小医学目标分割网络,增强早期疾病检测

SvANet&#xff1a;微小医学目标分割网络&#xff0c;增强早期疾病检测 提出背景前人工作医学对象分割微小医学对象分割注意力机制 SvANet 结构图SvANet 解法拆解解法逻辑链 论文&#xff1a;SvANet: A Scale-variant Attention-based Network for Small Medical Object Segmen…

PHP7.4安装使用rabbitMQ教程(windows)

&#xff08;1&#xff09;&#xff0c;安装rabbitMQ客户端erlang语言 一&#xff0c;erlang语言安装 下载地址1—— 下载地址2——https://www.erlang.org/patches/otp-27.0 二&#xff0c;rabbitMQ客户端安装 https://www.rabbitmq.com/docs/install-windows &#xff08…

Python+wxauto=微信自动化?

Pythonwxauto微信自动化&#xff1f; 一、wxauto库简介 1.什么是wxauto库 wxauto是一个基于UIAutomation的开源Python微信自动化库。它旨在帮助用户通过编写Python脚本&#xff0c;轻松实现对微信客户端的自动化操作&#xff0c;从而提升效率并满足个性化需求。这一工具的出现&…

【Linux】重定向 | 为什么说”一切皆文件?“

目录 前言 1.文件描述符分配规则 2.dup2 重定向接口 3.重定向 3.1>输出重定向 3.2>>追加重定向 3.3<输入重定向 3.4 shell 模拟实现< > 3.5 理解> 4. 理解“Linux 下一切皆文件” 前言 问&#xff1a;fd 为什么默认从 3 开始&#xff0c;而不是…

深度学习-6-自编码器和去噪自动编码器和变分自编码器

参考keras基于自编码器的语音信号降噪 参考今天来介绍一下什么是去噪自动编码器(DenoisingAutoencoder) 1 keras实现自编码器图像去噪 自编码器是一种简单的人工神经网络 (ANN),经过训练可以学习输入数据的编码表示,这种无监督机制不需要标签。自编码器由两个神经网络组…

【练习】分治--归并排序

&#x1f3a5; 个人主页&#xff1a;Dikz12&#x1f525;个人专栏&#xff1a;算法(Java)&#x1f4d5;格言&#xff1a;吾愚多不敏&#xff0c;而愿加学欢迎大家&#x1f44d;点赞✍评论⭐收藏 目录 归并排序 代码实现 交易逆序对的总数 题目描述 ​编辑 题解 代码实…

前端Vue组件化实践:打造灵活可维护的地址管理组件

随着前端技术的不断演进&#xff0c;复杂度和开发难度也随之上升。传统的一体化开发模式使得每次小小的修改或功能增加都可能牵一发而动全身&#xff0c;严重影响了开发效率和维护成本。组件化开发作为一种解决方案&#xff0c;通过模块化、独立化的开发方式&#xff0c;实现了…

云计算【第一阶段(29)】远程访问及控制

一、ssh远程管理 1.1、ssh (secureshell)协议 是一种安全通道协议对通信数据进行了加密处理&#xff0c;用于远程管理功能SSH 协议对通信双方的数据传输进行了加密处理&#xff0c;其中包括用户登录时输入的用户口令&#xff0c;建立在应用层和传输层基础上的安全协议。SSH客…

SQL 多变关联使用子查询去重

不去重状态 select a.*,b.recon_amt from free_settlement_first aleft join free_settlement_second b on a.settlement_first_id b.settlement_first_id 有2条数据出现了重复 使用子查询去重 select a.*,b.recon_amt from free_settlement_first aleft join free_settlem…

谈谈软件交互设计

谈谈软件交互设计 交互设计的由来 交互设计(Interaction Design)这一概念,最初是由IDEO创始人之一Bill.Moggridge(莫格里奇)1984年在一次会议上提出。他设计了世界上第一台笔记本电脑Compass,并写作出版了在交互设计领域影响深远的《Designing Interactions》一书,被称…

Azcopy Sync同步Azure文件共享

Azcopy Sync同步Azure文件共享 一、工作原理二、安装 AzCopy在 Windows 上在 Linux 上 三、资源准备1. 创建源和目标 Azure 存储账户2. 创建源和目标文件共享3. 确定路径4. 生成源和目的存储账户的共享访问签名&#xff08;SAS&#xff09;令牌配置权限示例生成的 URL 四、Azco…

AI算法14-套索回归算法Lasso Regression | LR

套索回归算法概述 套索回归算法简介 在统计学和机器学习中&#xff0c;套索回归是一种同时进行特征选择和正则化&#xff08;数学&#xff09;的回归分析方法&#xff0c;旨在增强统计模型的预测准确性和可解释性&#xff0c; 正则化是一种回归的形式&#xff0c;它将系数估…

课程的概述

课程概述 课程类型 课程理论流派 制约课程开发的因素 课程设计的概念及两种模式 课程内容 课程评价 新课程改革理念

前一段时间比较火的刷网课平台源码,带数据库和教程

前一段时间比较火的刷网课平台源码&#xff0c;带数据库和教程。 好在疫情已经结束了&#xff0c;希望今后世上再无网课。 这个代码免费提供给大家学习开发用吧&#xff0c;作为一个php的入门学习案例用用还可以。 使用办法 网站根目录解压 打开nginx.htaccess文件&#x…