逻辑回归：Sigmoid函数在分类问题中的应用

欢迎来到我的主页：【Echo-Nie】

本篇文章收录于专栏【机器学习】

在这里插入图片描述

1 什么是Sigmoid函数？

Sigmoid函数（Logistic函数）是机器学习中最经典的激活函数之一，是一个在生物学中常见的S型函数，也称为S型生长曲线。在信息科学中，由于其单增以及反函数单增等性质，常被用作神经网络的激活函数，将变量映射到0,1之间。其数学表达如下：

$\sigma(x) = \frac{1}{1 + e^{-x}} = \frac{e^x}{e^x + 1}$

函数图像呈现典型的 “S” 形曲线，具有以下特征：

定义域： $(-\infty, +\infty)$
值域： $(0, 1)$
对称性：关于点(0, 0.5)中心对称
可导性：处处可导

在这里插入图片描述

2 数学性质详解

2.1 导数计算

Sigmoid的导数可用其自身表示，这在反向传播中非常关键：

$\frac{d\sigma}{dx} = \sigma(x)(1 - \sigma(x))$

数学推导：
$\begin{aligned} \frac{d}{dx}\sigma(x) &= \frac{d}{dx}\left( \frac{1}{1 + e^{-x}} \right) \\ &= \frac{e^{-x}}{(1 + e^{-x})^2} \\ &= \frac{1}{1 + e^{-x}} \cdot \frac{e^{-x}}{1 + e^{-x}} \\ &= \sigma(x)(1 - \sigma(x)) \end{aligned}$

2.2 重要性质

特性	说明
非线性	使神经网络可以学习非线性模式
饱和性	当输入绝对值较大时梯度趋近于零
概率解释	输出值可直接解释为概率
平滑性	函数各阶导数均存在，有利于数值计算

3 Python实现

3.1 函数可视化

import numpy as np
import matplotlib.pyplot as plt

# 定义Sigmoid函数
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# 生成x值范围
x = np.linspace(-10, 10, 100)
y = sigmoid(x)

plt.figure(figsize=(8, 6))
# 绘制Sigmoid曲线
plt.plot(x, y, label=r'$\sigma(x) = \frac{1}{1 + e^{-x}}$', color='blue', linewidth=2)
plt.title("Sigmoid Function", fontsize=16, fontweight='bold')
plt.xlabel(r"$x$", fontsize=14)
plt.ylabel(r"$\sigma(x)$", fontsize=14)
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.minorticks_on()  
plt.tick_params(which='both', width=2)  
plt.tick_params(which='major', length=7) 
plt.tick_params(which='minor', length=4, color='gray') 

# 添加水平参考线（y=0.5）
plt.axhline(0.5, color='red', linestyle='--', alpha=0.5, linewidth=1)

plt.legend(fontsize=12)
plt.tight_layout()
# 显示图形
plt.show()

在这里插入图片描述

3.2 导数可视化

import numpy as np
import matplotlib.pyplot as plt

# Sigmoid函数
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Sigmoid导数
def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

# 生成数据点
x = np.linspace(-10, 10, 400)  # 生成-10到10之间的400个点
dy = sigmoid_derivative(x)     # 计算导数

# 绘制Sigmoid导数图
plt.figure(figsize=(8, 5)) 
plt.plot(x, dy, label=r"$\frac{d\sigma}{dx}$", linewidth=2.5)  
plt.title("Sigmoid Function Derivative", fontsize=14, fontweight='bold')  
plt.xlabel("$x$", fontsize=12) 
plt.ylabel("$\\frac{d\sigma}{dx}$", fontsize=12)  
plt.axvline(0, color='red', linestyle='--', alpha=0.5, label="$x=0$")  
plt.grid(color='gray', linestyle='--', linewidth=0.5, alpha=0.6)  
plt.legend(loc='best', fontsize=12)  
plt.tight_layout()  
plt.show()

在这里插入图片描述

4 应用场景

4.1 逻辑回归做二分类

在逻辑回归中，Sigmoid将线性组合 $z = w^Tx + b$ 转换为概率：

$\sigma(z) = \frac{1}{1 + e^{-(w^Tx + b)}}$

决策边界：当 $\sigma(z) \geq 0.5$ 时预测为类别1，等价于 $\geq 0$

from sklearn.linear_model import LogisticRegression
X = [[1.2], [2.4], [3.1], [4.8], [5.0]]
y = [0, 0, 1, 1, 1]
model = LogisticRegression()
model.fit(X, y)
# 预测概率
print(model.predict_proba([[3.0]]))

4.2 神经网络激活函数

虽然现代深度学习更多使用ReLU，但在以下场景仍有用：

输出层需要概率输出的分类任务
LSTM等特殊网络结构的门控机制
强化学习的动作选择概率

4.2.1 概率输出的分类任务

在二分类问题中，Sigmoid函数常用于输出层，将线性组合的结果转换为一个介于0到1之间的值，表示样本属于正类的概率。

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 生成二分类数据集
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 转换为 PyTorch 张量
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# 定义模型
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 5)
        self.act = nn.Sigmoid()
        self.fc2 = nn.Linear(5, 1)

    def forward(self, x):
        x = self.act(self.fc1(x))
        return self.fc2(x)

# 初始化模型和优化器
model = SimpleNN()
criterion = nn.BCEWithLogitsLoss()  # 二元交叉熵损失
optimizer = optim.Adam(model.parameters(), lr=0.01)

# 训练模型
epochs = 100
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}")

# 评估模型
model.eval()
with torch.no_grad():
    y_pred = torch.sigmoid(model(X_test)).round().numpy()
accuracy = accuracy_score(y_test.numpy(), y_pred)
print(f"Test Accuracy: {accuracy:.4f}")

Epoch [10/100], Loss: 0.6665
Epoch [20/100], Loss: 0.6172
Epoch [30/100], Loss: 0.5569
Epoch [40/100], Loss: 0.4926
Epoch [50/100], Loss: 0.4361
Epoch [60/100], Loss: 0.3928
Epoch [70/100], Loss: 0.3627
Epoch [80/100], Loss: 0.3431
Epoch [90/100], Loss: 0.3307
Epoch [100/100], Loss: 0.3229
Test Accuracy: 0.8400

4.2.2 LSTM 网络中的门控机制

import torch
import torch.nn as nn

class SimpleLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), self.lstm.hidden_size)
        c0 = torch.zeros(1, x.size(0), self.lstm.hidden_size)
        out, (hn, cn) = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])  # 取最后一个时间步的输出
        return torch.sigmoid(out)  # 输出概率

# 随机数据测试模型
input_size = 10
hidden_size = 5
output_size = 1

model = SimpleLSTM(input_size, hidden_size, output_size)
x = torch.randn(32, 5, input_size)  # 32 是 batch_size，5 是序列长度
output = model(x)
print("LSTM output:", output)

LSTM output: tensor([[0.5431],
        [0.4636],
        [0.4578],
        [0.5197],
        [0.5001],
        [0.5229],
        [0.4976],
        [0.4924],
        [0.5234],
        [0.5413],
        [0.4881],
        [0.4861],
        [0.5162],
        [0.5169],
        [0.4688],
        [0.5114],
        [0.5059],
        [0.5013],
        [0.5215],
        [0.4460],
        [0.5219],
        [0.5306],
        [0.5099],
        [0.4722],
        [0.4930],
        [0.5114],
        [0.5249],
        [0.4784],
        [0.4850],
        [0.4994],
        [0.4955],
        [0.4910]], grad_fn=<SigmoidBackward0>)

4.2.3 强化学习的动作选择概率

举例：预测用户是否会点击某个广告，0 or 1

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# 生成模拟数据集
# 特征：用户年龄、浏览历史、广告类型、设备类型
# 标签：是否点击广告（0：未点击，1：点击）
X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=42)

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 创建逻辑回归模型
model = LogisticRegression()
model.fit(X_train, y_train)

# 预测测试集
y_pred = model.predict(X_test)
print("预测结果:", y_pred)

# 模型评估
print("准确率:", accuracy_score(y_test, y_pred))
print("分类报告:\n", classification_report(y_test, y_pred))

预测结果: [1 1 0 0 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0
 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 0
 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 0 0 1 1 0 0 1 0 0 0
 1 1 0 0 0 1 0 0 1 1 1 1 1 0 1 1 0 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 0 1 0
 1 0 0 1 0 0 1 1 0 1 1 0 1 0 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1
 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 0
 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1
 1 1 0 1 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 1 1 1 0 1 0 0 0
 1 0 1 0]
准确率: 0.8666666666666667
分类报告:
               precision    recall  f1-score   support

           0       0.85      0.90      0.87       153
           1       0.88      0.84      0.86       147

    accuracy                           0.87       300
   macro avg       0.87      0.87      0.87       300
weighted avg       0.87      0.87      0.87       300

4.3 数据标准化

将任意范围的特征值压缩到(0,1)区间：

def normalize(values):
    scaled = (values - np.min(values)) / (np.max(values) - np.min(values))
    return sigmoid(scaled * 10 - 5)  # 加强非线性缩放

original = np.array([10, 20, 30, 40, 50])
print(normalize(original))

5 sklearn代码实战

5.1 基于sklearn的逻辑回归实例

使用 sklearn.datasets 中的 make_classification 数据集，这是一个用于生成二分类或多分类问题的合成数据集。通过该数据集，可以灵活地控制特征数量、类别分布等参数。

数据加载与预处理

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# 生成二分类数据集
X, y = make_classification(n_samples=1000, n_features=2, n_redundant=0, n_clusters_per_class=1, random_state=42)

# 划分训练集和测试集，8：2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

训练模型

from sklearn.linear_model import LogisticRegression
# 逻辑回归模型
model = LogisticRegression()
# 训练模型
model.fit(X_train, y_train)

模型评估

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# 预测测试集
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"模型准确率: {accuracy:.2f}")
conf_matrix = confusion_matrix(y_test, y_pred)
print("混淆矩阵:")
print(conf_matrix)
class_report = classification_report(y_test, y_pred)
print("分类报告:")
print(class_report)

模型准确率: 0.90
混淆矩阵:
[[97  7]
 [13 83]]
分类报告:
              precision    recall  f1-score   support

           0       0.88      0.93      0.91       104
           1       0.92      0.86      0.89        96

    accuracy                           0.90       200

可视化决策边界

import numpy as np
import matplotlib.pyplot as plt

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                     np.arange(y_min, y_max, 0.01))

# 预测网格点的类别
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# 绘制决策边界和数据点
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.Paired)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', cmap=plt.cm.Paired)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Logistic Regression Decision Boundary")
plt.show()

在这里插入图片描述

5.2 sklearn乳腺癌数据集

使用 sklearn.datasets 中的 load_breast_cancer 数据集，这是一个用于乳腺癌诊断的二分类数据集。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_curve, auc, accuracy_score, classification_report

# 加载乳腺癌数据集
data = load_breast_cancer()
X, y = data.data, data.target

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 数据标准化
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 创建逻辑回归模型
model = LogisticRegression(max_iter=300, solver='lbfgs')
model.fit(X_train_scaled, y_train)

# 预测测试集的概率
y_scores = model.predict_proba(X_test_scaled)[:, 1]
y_pred = model.predict(X_test_scaled)

print("准确率:", accuracy_score(y_test, y_pred))
print("分类报告:\n", classification_report(y_test, y_pred))

# 计算ROC
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)

# ROC曲线
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkblue', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='gray', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=14)
plt.ylabel('True Positive Rate', fontsize=14)
plt.title('Receiver Operating Characteristic (ROC) Curve', fontsize=16)
plt.legend(loc='lower right', fontsize=12)
plt.grid(color='lightgray', linestyle='--', linewidth=0.5)
plt.gca().spines['top'].set_color('none')
plt.gca().spines['right'].set_color('none')
plt.gca().spines['bottom'].set_linewidth(1.2)
plt.gca().spines['left'].set_linewidth(1.2)
plt.gca().tick_params(axis='both', which='major', labelsize=12)
plt.tight_layout()
plt.show()

准确率: 0.9736842105263158
分类报告:
               precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114