深入探讨多线程编程：从0-1为您解释多线程（下）

文章目录

6. 死锁
- 6.1 死锁
- - 原因
- 6.2 避免死锁的方法
- - 加锁顺序一致性。
  - 超时机制。
  - 死锁检测和解除机制。

6. 死锁

6.1 死锁

原因

系统资源的竞争：（产生环路）当系统中供多个进程共享的资源数量不足以满足进程的需要时，会引起进程对2资源的竞争而产生死锁。例如，两个进程分别持有资源R1和R2，但进程1申请资源R2，进程2申请资源R1时，两者都会因为所需资源被占用而阻塞。

#include <iostream>
#include <thread>
#include <mutex>
#include <chrono>

std::timed_mutex resourceR1, resourceR2;

bool acquireResource(std::timed_mutex& r, const std::string& name) {
    std::chrono::milliseconds timeout(5000);  // 5秒超时
    if (r.try_lock_for(timeout)) {
        std::cout << "Process " << name << " has acquired its resource." << std::endl;
        return true;
    }
    else {
        std::cout << "Process " << name << " failed to acquire the resource within 5 seconds. Terminating..." << std::endl;
        return false;
    }
}

void process1() {
    if (acquireResource(resourceR1, "1")) {
        // 如果成功获取资源R1，尝试获取资源R2
        if (!acquireResource(resourceR2, "1")) {
            // 若获取资源R2失败，解锁资源R1并终止线程
            resourceR1.unlock();
            return;
        }

        /********************************************************/
        //需要执行的业务逻辑(不会被执行)
        /********************************************************/

        resourceR1.unlock();
        resourceR2.unlock();
    }
}

void process2() {
    if (acquireResource(resourceR2, "2")) {
        if (!acquireResource(resourceR1, "2")) {
            resourceR2.unlock();
            return;
        }

        // 同样，此处的业务逻辑也不会被执行
        resourceR1.unlock();
        resourceR2.unlock();
    }
}

int main() {
    std::thread t1(process1);
    std::thread t2(process2);

    t1.join();
    t2.join();

    return 0;
}

在这里插入图片描述

逻辑错误：程序逻辑错误可能导致死锁，如死循环或无限等待的情况。例如，在数据交换中，如果一方发送的消息丢失，发送方会等待接收返回信息，而接收方会无限等待接收信息，导致死锁。

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <chrono>

std::mutex mtx;
std::condition_variable cv1, cv2;
bool messageReceived = false;
bool acknowledgmentSent = false;

// 发送线程
void senderThread() {
    std::cout << "Sender: Sending data...\n";
    // 假设发送数据（此处省略具体发送逻辑）

    std::unique_lock<std::mutex> lk(mtx);
    auto timeout = std::chrono::system_clock::now() + std::chrono::seconds(5);
    while (!acknowledgmentSent && std::cv_status::timeout == cv1.wait_until(lk, timeout)) {
        if (std::chrono::system_clock::now() >= timeout) {
            std::cout << "Sender: Timeout occurred, assuming no acknowledgement received and exiting.\n";
            break;  // 超时后退出循环，不再等待确认
        }
    }
    lk.unlock();
}

// 接收线程
void receiverThread() {
    std::this_thread::sleep_for(std::chrono::seconds(2)); // 假设在此期间消息丢失

    std::unique_lock<std::mutex> lk(mtx);
    std::cout << "Receiver: Received data...\n";
    messageReceived = true;
    cv2.notify_one();  // 假设这是接收方发送确认的方式

    // 接收方也会等待发送方确认收到确认信息（这是一个逻辑错误，实际应用中通常不需要）
    auto timeout = std::chrono::system_clock::now() + std::chrono::seconds(5);
    while (!messageReceived && std::cv_status::timeout == cv2.wait_until(lk, timeout)) {
        if (std::chrono::system_clock::now() >= timeout) {
            std::cout << "Receiver: Timeout occurred, assuming message not delivered and exiting.\n";
            break;  // 超时后退出循环，不再等待消息
        }
    }
    lk.unlock();
}

int main() {
    std::thread t1(senderThread);
    std::thread t2(receiverThread);

    t1.join();
    t2.join();

    return 0;
}

在这里插入图片描述
两秒后

五秒后

不恰当的同步：在并发编程中，不恰当的同步机制可能导致死锁。例如，多个线程在等待其他线程释放锁时，如果这些线程彼此都持有对方需要的锁，就会导致死锁。

#include <iostream>
#include <thread>
#include <mutex>
#include <chrono>

std::timed_mutex mtx1, mtx2;

void threadFunction1() {
    if (mtx1.try_lock_for(std::chrono::seconds(5))) {
        std::cout << "Thread 1: Acquired mtx1\n";

        // 尝试获取mtx2，如果5秒内未获取成功，则释放mtx1以防止死锁
        if (!mtx2.try_lock_for(std::chrono::seconds(5))) {
            mtx1.unlock();
            std::cout << "Thread 1: Could not acquire mtx2 within 5 seconds, releasing mtx1 to prevent deadlock.\n";
            return;
        }

        std::cout << "Thread 1: Acquired mtx2\n";
        mtx2.unlock();
        mtx1.unlock();
    }
    else {
        std::cout << "Thread 1: Could not acquire mtx1 within 5 seconds.\n";
    }
}

void threadFunction2() {
    if (mtx2.try_lock_for(std::chrono::seconds(5))) {
        std::cout << "Thread 2: Acquired mtx2\n";

        if (!mtx1.try_lock_for(std::chrono::seconds(5))) {
            mtx2.unlock();
            std::cout << "Thread 2: Could not acquire mtx1 within 5 seconds, releasing mtx2 to prevent deadlock.\n";
            return;
        }

        std::cout << "Thread 2: Acquired mtx1\n";
        mtx1.unlock();
        mtx2.unlock();
    }
    else {
        std::cout << "Thread 2: Could not acquire mtx2 within 5 seconds.\n";
    }
}

int main() {
    std::thread t1(threadFunction1);
    std::thread t2(threadFunction2);

    t1.join();
    t2.join();

    return 0;
}

在这里插入图片描述

6.2 避免死锁的方法

加锁顺序一致性。

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx1, mtx2;

// 定义一个固定的全局锁顺序
const bool lockOrder[] = {true, false}; // 先锁mtx1，后锁mtx2

void worker(int id) {
    if (lockOrder[0]) {
        mtx1.lock();
        std::cout << "Thread " << id << ": Acquired mtx1\n";

        // 在拥有mtx1的情况下尝试获取mtx2
        mtx2.lock();
        std::cout << "Thread " << id << ": Acquired mtx2\n";
    } else {
        // 如果定义的顺序是先锁mtx2
        mtx2.lock();
        std::cout << "Thread " << id << ": Acquired mtx2\n";

        // 在拥有mtx2的情况下尝试获取mtx1
        mtx1.lock();
        std::cout << "Thread " << id << ": Acquired mtx1\n";
    }

    // 重要：解锁按照相反的顺序进行
    mtx2.unlock();
    mtx1.unlock();

    // 业务逻辑...
}

int main() {
    std::thread t1(worker, 1);
    std::thread t2(worker, 2);

    t1.join();
    t2.join();

    return 0;
}

在上述示例中，我们预定义了一个全局的锁获取顺序数组lockOrder，确保所有线程按照同样的顺序（本例中是先获取mtx1再获取mtx2）来获取互斥锁。这样可以防止如下情况：一个线程持有mtx1并等待mtx2，而另一个线程持有mtx2并等待mtx1，从而形成死锁。

请注意，为了避免死锁，不仅在获取锁时需遵循一致的顺序，而且在解锁时也应按照相反的顺序进行。在上面的代码中，无论哪种顺序，我们都是先解锁mtx2，然后再解锁mtx1。这样可以确保在任何时候，已经持有两个锁的线程都能顺利地按顺序释放它们，避免死锁的发生。

超时机制。

以下是一个使用std::timed_mutex的示例，当尝试获取互斥锁时设置一个超时时间，如果在规定时间内没能获取到锁，则线程放弃获取，从而可以避免死锁的发生：

#include <iostream>
#include <thread>
#include <mutex>
#include <chrono>

std::timed_mutex mtx1, mtx2;

void worker(int id) {
    if (id == 1) {
        // 线程1尝试获取mtx1
        if (mtx1.try_lock_for(std::chrono::seconds(5))) {
            std::cout << "Thread " << id << ": Acquired mtx1\n";

            // 在持有mtx1的前提下尝试获取mtx2，超时时间为5秒
            if (mtx2.try_lock_for(std::chrono::seconds(5))) {
                std::cout << "Thread " << id << ": Acquired mtx2\n";
                mtx2.unlock();
            } else {
                std::cout << "Thread " << id << ": Could not acquire mtx2 within 5 seconds, releasing mtx1.\n";
            }
            mtx1.unlock();
        } else {
            std::cout << "Thread " << id << ": Could not acquire mtx1 within 5 seconds.\n";
        }
    } else if (id == 2) {
        // 线程2尝试获取mtx2，同样设置5秒超时
        if (mtx2.try_lock_for(std::chrono::seconds(5))) {
            std::cout << "Thread " << id << ": Acquired mtx2\n";

            // 在持有mtx2的前提下尝试获取mtx1，同样设置5秒超时
            if (mtx1.try_lock_for(std::chrono::seconds(5))) {
                std::cout << "Thread " << id << ": Acquired mtx1\n";
                mtx1.unlock();
            } else {
                std::cout << "Thread " << id << ": Could not acquire mtx1 within 5 seconds, releasing mtx2.\n";
            }
            mtx2.unlock();
        } else {
            std::cout << "Thread " << id << ": Could not acquire mtx2 within 5 seconds.\n";
        }
    }
}

int main() {
    std::thread t1(worker, 1);
    std::thread t2(worker, 2);

    t1.join();
    t2.join();

    return 0;
}

在这个示例中，两个线程都试图按顺序获取互斥锁，但是如果在5秒钟内无法获取所需的下一个锁，它们都会释放已经持有的锁并退出相应的操作，从而避免了死锁的发生。

死锁检测和解除机制。

在C++标准库中并没有内置的死锁检测和解除机制，但我们可以通过设计良好的程序逻辑和利用特定的同步原语（如条件变量、互斥量等）来实施自己的死锁检测和解除策略。

// 假设有以下结构表示资源和进程的状态
struct Process {
    int pid; // 进程ID
    std::vector<int> holdingResources; // 当前持有的资源ID集合
    std::vector<int> requestingResources; // 正在请求的资源ID集合
};

struct Resource {
    int rid; // 资源ID
    int available; // 当前可用的数量
    std::map<int, int> allocated; // 已分配给各个进程的资源数量
};

// 假设有个全局的数据结构存储所有进程和资源的状态
std::vector<Process> processes;
std::vector<Resource> resources;

// 自定义的死锁检测函数（伪代码）
bool detectAndResolveDeadlocks() {
    // 初始化资源分配图（Resource Allocation Graph, RAG）
    // ...

    // 检查是否有循环等待
    for (auto& p : processes) {
        // 使用拓扑排序或其他方法检查是否存在环路
        if (isCycleDetectedInRAG(p)) {
            // 死锁检测出环，现在需要解除死锁
            resolveDeadlock(p.pid);
            return true;
        }
    }

    return false; // 没有发现死锁
}

// 解除死锁的策略有很多种，以下是一个简化的版本（仅作示例）
void resolveDeadlock(int pid) {
    // 可以选择一个进程撤销其部分请求或者抢占它的资源
    // 例如，选择持有最多资源但请求未满足最多的进程，释放其最少的一个资源
    Process& victim = getVictimProcess(pid);
    int resourceToRelease = getResourceToRelease(victim);
    
    // 释放资源并重新开始检测
    releaseResource(victim, resourceToRelease);
    victim.requestingResources.erase(
        std::find(victim.requestingResources.begin(), victim.requestingResources.end(), resourceToRelease));
}

// ... 其他辅助函数（getVictimProcess, getResourceToRelease, releaseResource等）