电商零售商家需求预测及库存优化问题（第1问）

电商零售商家需求预测及库存优化问题
数据和题目来源于 2023 年 MathorCup 高校数学建模挑战赛——大数据竞赛
只有第一问，使用ARIMA做预测，使用聚类算法做特征相似性

1 数据读取和处理

1.1 清除重复值

注意附件4要去重，原来是56条数据，去重后是54条数据。

print(fujian1_df.shape)
fujian1_df = fujian1_df.drop_duplicates()
print(fujian1_df.shape)

print(fujian2_df.shape)
fujian2_df = fujian2_df.drop_duplicates()
print(fujian2_df.shape)

print(fujian3_df.shape)
fujian3_df = fujian3_df.drop_duplicates()
print(fujian3_df.shape)

print(fujian4_df.shape)
fujian4_df = fujian4_df.drop_duplicates()
print(fujian4_df.shape)

(331336, 5)
(331336, 5)
(2302, 4)
(2302, 4)
(37, 4)
(37, 4)
(56, 3)
(54, 3)

1.2 数据合并，后面可能会用到

merged_df= pd.DataFrame()
merged_df = pd.merge(fujian1_df, fujian2_df, on='product_no', how='inner')
merged_df = pd.merge(merged_df, fujian3_df, on='seller_no', how='inner')
merged_df = pd.merge(merged_df, fujian4_df, on='warehouse_no', how='inner')
merged_df.sort_values(by='date', ascending=True, inplace=True)
merged_df.shape

合并完数据是33w条数据，13维特征。

2 使用ARIMA模型做预测

每个组合单独预测模，总共1996条数据，每个数据单独做一个模型训练和预测，速度可以接受。

遍历每个组

i=0
for group_key, group_data in grouped:
    seller_no, product_no, warehouse_no = group_key
    # 拟合ARIMA模型
    model = sm.tsa.ARIMA(group_data['qty'], order=(1, 1, 1))

    # 训练模型
    model_fit = model.fit()

    # 使用模型进行未来15天的销量预测
    forecast = model_fit.forecast(steps=15)

    # 将预测结果存储在DataFrame中，其中包含日期和销量
    forecast_df = pd.DataFrame({'seller_no':seller_no,'product_no1':product_no,
                                'warehouse_no1':warehouse_no,'date':future_dates,
                                'qty': forecast})
    combined_lis.append(forecast_df)
    if i % 200 ==0:
        print(i)
    i+=1
combined_df = pd.concat(combined_lis)
combined_df.to_excel("预测结果1.xlsx")

结果保存到excel表格中，如：
在这里插入图片描述

3 使用聚类算法做特征相似度

另外请讨论：根据数据分析及建模过程，这些由商家、仓库、商品形成的时间序列如何分类，使同一类别在需求上的特征最为相似？

3.1 先将数据转换为1996*166的格式

总共1996个组合，每个组合166条过去数据，需要先转换为1996*166的形式

for group_key, group_data in grouped:
    seller_no, product_no, warehouse_no = group_key
    date_qty_df = group_data[['date','qty']]
    new_name = seller_no + "+" + product_no + "+" + warehouse_no
    date_qty_df = date_qty_df.rename(columns={'qty': new_name})

    # 进行列拼接，合并到 combined_data 中
    date_qty_df_all = pd.merge(date_qty_df_all, date_qty_df, on='date', how='inner')
    
    if i % 200 ==0:
        print(i)
    i+=1

3.2 使用聚类算法

遍历不同的簇数

for n_clusters in cluster_range:
    kmeans = KMeans(n_clusters=n_clusters, random_state=0)
    cluster_labels = kmeans.fit_predict(X)
    silhouette_avg = silhouette_score(X, cluster_labels)
    silhouette_scores.append(silhouette_avg)

# 绘制轮廓分数与簇数的关系图
#plt.figure(figsize=(10, 6))
plt.plot(cluster_range, silhouette_scores, marker='o')
plt.xlabel('簇数 (K)')
plt.ylabel('轮廓分数')
plt.title('K均值聚类 - 调节簇数')
plt.grid(True)
plt.show()