优化算法---遗传算法

一、基本定义

遗传算法(Genetic Algorithm,GA)是模仿自然界生物进化机制发展起来的随机全局搜索和优化方法，借鉴了达尔文的进化论和孟德尔的遗传学说。其本质是一种高效、并行、全局搜索的方法，能在搜索过程中自动获取和积累有关搜索空间的知识，并自适应地控制搜索过程以求得最佳解。

1.1 遗传与变异

构成生物的基本结构和功能单位是细胞(Cell)。细胞中含有的一种微小的丝状化合物称为染色体(Chromosome)，生物的所有遗传信息都包含在这个复杂而又微小的染色体中。遗传信息是由基因组成，生物的各种性状由其相应的基因所控制，基因是遗传的基本单位。细胞通过分裂具有自我复制的能力，在细胞分裂的过程中，其遗传基因也同时被复制到下一代从而其性状也被下一代所继承。
细胞在分裂时，遗传物质DNA通过复制(Reproduction)而转移到新产生的细胞中，新细胞就继承了旧细胞的基因。有性生殖生物在繁殖下一代时，两个同源染色体之间通过交叉(Crossover)而重组，即在两个染色体的某一相同位置处DNA被切断，其前后两串分别交叉组合而形成两个新的染色体。另外，在进行细胞复制时，虽然概率很小，但也有可能产生某些复制差错，从而使DNA发生某种变异(Mutation)，产生出新的染色体。
在这里插入图片描述

1.2 进化

生物在其延续生存的过程中，逐渐适应于其生存环境，使得其品质不断得到改良，这种生命现象称为进化(Evolution)。生物的进化是以集团的形式共同进行的，这样的一个团体称为群体(Population)，组成群体的单个生物称为个体(Individual)，每一个个体对其生存环境都有不同的适应能力（解决问题的好坏），这种适应能力称为个体的适应度(Fitness)。
核心：
1.种群的交配、繁衍后代：遗传与变异
2.自然的选择：根据适应度进行选择，保留优质个体并淘汰劣质个体。
目标：通过多次模拟生物进化，让种群逐步向更优的解进化，直到找到问题的最佳解或接近最优解。

二、算法简介

2.1 基本原理

生物的进化过程主要是通过染色体之间的交叉和染色体的变异来完成的，与此相对应，遗传算法中最优解的搜索过程也模仿生物的这个进化过程，使用所谓的遗传算子作用于群体 $P (t)$ 中，进行下述遗传操作，从而得到新一代群体 $P (t + 1)$ 。

选择： 根据各个个体的适应度(函数)，按照一定的规则或方法，从第 $t$ 代群体 $P (t)$ 中选择出一些优良的个体遗传到下一代群体 $P (t + 1)$ 中。
交叉： 将群体 $P (t)$ 内的各个个体随机搭配成对，对每一对个体，以某个概率(称为交叉概率，crossover rate)交换它们之间的部分染色体。
变异： 对群体 $P (t)$ 中的每一个个体，以某概率(称为变异概率，mutation rate)改变某一个或某一些基因座上的基因值为其他的等位基因。

2.2 算法步骤

1、初始化种群
随机生成一组“解”，即多个“个体”，这些个体组成了“种群”。
每个个体由一串基因编码表示（可以是二进制、实数或其他形式）。
2、评估适应度
对每个个体进行评估，计算其“适应度”，即这个解有多好。适应度越高表示解越优。
3、选择（自然选择）
根据适应度选择“优秀个体”作为父代，淘汰适应度低的个体。
模仿自然界优胜劣汰的过程，让优秀个体更有可能传递自己的基因。
4、交叉（基因重组）
从选出的父代中随机挑选两名“父母”，交换它们的部分基因，产生“子代”。比如，将两个个体基因的一部分互换，模拟生物繁殖。
5、变异（基因变异）
随机改变某些个体的基因（比如基因翻转0变1或1变0），模拟基因突变现象。
突变可以防止算法陷入局部最优，增加探索新解的能力。
6、迭代进化
重复以上步骤（选择、交叉、突变），让种群逐代进化，不断提高整体适应度。
7、停止条件
当达到设定的迭代次数或适应度达到期望值时，停止进化并输出最优解。
在这里插入图片描述

2.3 算法案例

2.3.1 最大值求解

将个体适应度大小映射为概率进行复制：适应度高的个体有更大概率复制，且复制的份数越条——轮盘赌法。

import random

# 1. 初始化种群
def init_population(pop_size, gene_length):
    # 初始化一个种群，种群中有 pop_size 个个体，每个个体是一个随机整数，表示基因值
    # 基因长度 gene_length 决定了基因的取值范围 (0 到 2^gene_length - 1)
    return [random.randint(0, 2**gene_length - 1) for _ in range(pop_size)]

# 2. 适应度函数
def fitness(x):
    # 适应度函数：这里我们以 x 的平方作为适应度，目标是最大化 f(x) = x^2
    return x**2

# 3. 选择操作（轮盘赌选择）
def selection(population, fitness_scores):
    # 计算适应度的总和
    total_fitness = sum(fitness_scores)
    
    # 计算每个个体被选中的概率，适应度越高，被选中的概率越大
    prob = [f / total_fitness for f in fitness_scores]
    
    # 使用轮盘赌选择法，按照适应度概率选择两个父母
    selected = random.choices(population, prob, k=2)
    
    # 返回选中的两个父母
    return selected

# 4. 交叉操作（单点交叉）
def crossover(parent1, parent2, crossover_rate=0.7, gene_length=5):
    # 判断是否进行交叉操作，交叉概率为 crossover_rate（默认为 70%）
    if random.random() < crossover_rate:
        # 随机选择一个交叉点（基因长度范围内）
        point = random.randint(1, gene_length - 1)
        
        # 使用位操作实现交叉
        mask = (1 << point) - 1  # 生成一个掩码，用于分割基因
        # child1 和 child2 是交叉后的两个子代个体
        child1 = (parent1 & mask) | (parent2 & ~mask)  # 从 parent1 和 parent2 中选取基因
        child2 = (parent2 & mask) | (parent1 & ~mask)
        
        # 将交叉后的基因限制在 [0, 31] 范围内（即 gene_length 位的范围）
        child1 = child1 % (2**gene_length)
        child2 = child2 % (2**gene_length)
        
        # 返回交叉生成的两个子代个体
        return child1, child2
    else:
        # 如果没有进行交叉，直接返回父代个体
        return parent1, parent2

# 5. 突变操作
def mutation(child, mutation_rate=0.01, gene_length=5):
    # 判断是否进行突变操作，突变概率为 mutation_rate（默认为 1%）
    if random.random() < mutation_rate:
        # 随机选择一个基因位点进行突变
        point = random.randint(0, gene_length - 1)
        
        # 通过异或操作对基因进行突变
        child ^= (1 << point)  # 翻转该位置的基因
        
        # 确保突变后的基因值在 [0, 31] 范围内
        child = child % (2**gene_length)
    
    # 返回突变后的个体
    return child

# 6. 遗传算法主循环
def genetic_algorithm(pop_size, gene_length, generations):
    # 初始化种群
    population = init_population(pop_size, gene_length)
    
    # 用于追踪当前最优个体
    best_individual = None
    best_fitness_value = -1  # 初始适应度设置为一个较小值
    
    # 迭代多代
    for generation in range(generations):
        # 计算每个个体的适应度
        fitness_scores = [fitness(individual) for individual in population]
        
        # 找到当前种群中的最优个体
        max_fitness = max(fitness_scores)
        max_fitness_index = fitness_scores.index(max_fitness)
        
        # 如果当前最优个体的适应度大于历史最优适应度，则更新最优个体
        if max_fitness > best_fitness_value:
            best_fitness_value = max_fitness
            best_individual = population[max_fitness_index]

        # 打印当前代数和最优个体及其适应度
        print(f"Generation {generation}: Best individual = {best_individual}, Fitness = {best_fitness_value}")
        
        # 创建新一代种群
        new_population = []
        
        # 保证最优个体进入新一代
        new_population.append(best_individual)
        
        # 进行选择、交叉、突变生成新的个体，直到新种群满员
        while len(new_population) < pop_size:
            # 选择两个父代个体
            parent1, parent2 = selection(population, fitness_scores)
            
            # 对父代个体进行交叉操作，生成两个子代个体
            child1, child2 = crossover(parent1, parent2, gene_length=gene_length)
            
            # 对子代个体进行突变操作
            new_population.append(mutation(child1, gene_length=gene_length))
            
            # 确保新种群的个体数量不会超过 pop_size
            if len(new_population) < pop_size:
                new_population.append(mutation(child2, gene_length=gene_length))
        
        # 更新种群为新生成的种群
        population = new_population
    
    # 返回最优个体和最优适应度
    return best_individual, best_fitness_value

# 7. 设置参数并运行算法
pop_size = 100  # 种群大小
gene_length = 5  # 基因长度（表示0到31之间的数）
generations = 50  # 迭代代数

# 运行遗传算法，得到最优个体及其适应度
best_individual, best_fitness = genetic_algorithm(pop_size, gene_length, generations)

# 打印最终找到的最优解
print(f"\nOptimal solution found: x = {best_individual}, f(x) = {best_fitness}")

在这里插入图片描述

2.3.2 旅行商问题求解

import random
import numpy as np

# 1. 计算城市间的距离
def calculate_distance(city1, city2):
    # 假设城市坐标为二维坐标
    return np.sqrt((city1[0] - city2[0])**2 + (city1[1] - city2[1])**2)

# 2. 初始化种群
def init_population(pop_size, num_cities):
    # 生成初始种群，每个个体是一个城市的访问顺序（城市的索引）
    population = []
    for _ in range(pop_size):
        individual = list(range(num_cities))
        random.shuffle(individual)
        population.append(individual)
    return population

# 3. 计算个体的适应度（路径长度）
def fitness(individual, cities):
    total_distance = 0
    # 计算路径总长度
    for i in range(len(individual) - 1):
        total_distance += calculate_distance(cities[individual[i]], cities[individual[i + 1]])
    # 加上从最后一个城市返回到起点的距离
    total_distance += calculate_distance(cities[individual[-1]], cities[individual[0]])
    return total_distance

# 4. 选择操作（轮盘赌选择）
def selection(population, fitness_scores):
    total_fitness = sum(fitness_scores)
    prob = [1 / f for f in fitness_scores]  # 使用倒数适应度，适应度小的个体更适合选择
    prob = [p / sum(prob) for p in prob]  # 归一化
    selected = random.choices(population, prob, k=2)
    return selected

# 5. 交叉操作（部分匹配交叉 PMX）
def crossover(parent1, parent2):
    # 随机选择交叉区间
    start, end = sorted(random.sample(range(len(parent1)), 2))
    
    # 创建子代，子代初始化为父代1的副本
    child1 = [-1] * len(parent1)
    child2 = [-1] * len(parent1)

    # 将父代的交叉区间拷贝到子代中
    child1[start:end+1] = parent1[start:end+1]
    child2[start:end+1] = parent2[start:end+1]

    # 对于子代的其余部分，填充剩余的城市，保持顺序
    def fill_child(child, parent, other_parent):
        for i in range(len(parent)):
            if child[i] == -1:
                gene = parent[i]
                while gene in child:  # 如果该基因已经存在于子代中，跳过
                    gene = other_parent[other_parent.index(gene) + 1]
                child[i] = gene
        return child

    child1 = fill_child(child1, parent1, parent2)
    child2 = fill_child(child2, parent2, parent1)
    
    return child1, child2

# 6. 突变操作（交换突变）
def mutation(child, mutation_rate=0.02):
    if random.random() < mutation_rate:
        # 随机选择两个城市交换位置
        i, j = random.sample(range(len(child)), 2)
        child[i], child[j] = child[j], child[i]
    return child

# 7. 遗传算法主循环
def genetic_algorithm(cities, pop_size=100, generations=500, mutation_rate=0.02):
    num_cities = len(cities)
    
    # 初始化种群
    population = init_population(pop_size, num_cities)
    
    best_individual = None
    best_fitness_value = float('inf')  # 用于追踪最优个体
    
    for generation in range(generations):
        # 计算种群中每个个体的适应度
        fitness_scores = [fitness(individual, cities) for individual in population]
        
        # 找到当前最优解
        min_fitness = min(fitness_scores)
        min_fitness_index = fitness_scores.index(min_fitness)
        if min_fitness < best_fitness_value:
            best_fitness_value = min_fitness
            best_individual = population[min_fitness_index]
        
        # 打印当前代和最优适应度
        print(f"Generation {generation}: Best fitness = {best_fitness_value}")
        
        # 创建新种群
        new_population = []
        
        # 保证最优个体进入新一代
        new_population.append(best_individual)
        
        # 进行选择、交叉、突变生成新的个体
        while len(new_population) < pop_size:
            parent1, parent2 = selection(population, fitness_scores)
            child1, child2 = crossover(parent1, parent2)
            new_population.append(mutation(child1, mutation_rate))
            if len(new_population) < pop_size:
                new_population.append(mutation(child2, mutation_rate))
        
        # 更新种群为新生成的种群
        population = new_population
    
    return best_individual, best_fitness_value

# 8. 设置参数并运行算法
# 城市位置：每个城市的坐标是一个二维点
cities = [(0, 0), (1, 3), (4, 3), (6, 1), (3, 0), (5, 5), (7, 2)]

best_individual, best_fitness = genetic_algorithm(cities, pop_size=100, generations=500)
print(f"\nOptimal solution found: Path = {best_individual}, Distance = {best_fitness}")

在这里插入图片描述

2.4 算法优缺点

优点：
1、通用性强：适合解决各种复杂优化问题，不需要知道问题的数学性质。
2、全局搜索能力强：不会陷入局部最优解，探索能力较强。
3、易于并行计算：可以多线程或分布式运行。
缺点：
1、计算复杂度高：大规模问题时计算量较大。
2、参数敏感性强：需要调整交叉率、突变率等参数才能达到最优效果。
3、早熟收敛：可能在局部最优点附近停滞，难以进一步优化。