前言
要分析一家公司的供应链,我们需要供应链不同阶段的数据,如有关采购、制造、运输、库存管理、销售和客户人口统计的数据。我为这项任务找到了一个理想的数据集,其中包括一家时尚和美容初创公司的供应链数据。
1. 相关数据集
让我们导入必要的 Python 库,开始执行供应链分析任务
import pandas as pd
import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"
data = pd.read_csv("supply_chain_data.csv")
print(data.head())
Product type SKU Price Availability Number of products sold \
0 haircare SKU0 69.808006 55 802
1 skincare SKU1 14.843523 95 736
2 haircare SKU2 11.319683 34 8
3 skincare SKU3 61.163343 68 83
4 skincare SKU4 4.805496 26 871
Revenue generated Customer demographics Stock levels Lead times \
0 8661.996792 Non-binary 58 7
1 7460.900065 Female 53 30
2 9577.749626 Unknown 1 10
3 7766.836426 Non-binary 23 13
4 2686.505152 Non-binary 5 3
Order quantities ... Location Lead time Production volumes \
0 96 ... Mumbai 29 215
1 37 ... Mumbai 23 517
2 88 ... Mumbai 12 971
3 59 ... Kolkata 24 937
4 56 ... Delhi 5 414
Manufacturing lead time Manufacturing costs Inspection results \
0 29 46.279879 Pending
1 30 33.616769 Pending
2 27 30.688019 Pending
3 18 35.624741 Fail
4 3 92.065161 Fail
Defect rates Transportation modes Routes Costs
0 0.226410 Road Route B 187.752075
1 4.854068 Road Route B 503.065579
2 4.580593 Air Route C 141.920282
3 4.746649 Rail Route A 254.776159
4 3.145580 Air Route A 923.440632
[5 rows x 24 columns]
让我们来看看数据集的描述性统计:
print(data.describe())
Price Availability Number of products sold Revenue generated \
count 100.000000 100.000000 100.000000 100.000000
mean 49.462461 48.400000 460.990000 5776.048187
std 31.168193 30.743317 303.780074 2732.841744
min 1.699976 1.000000 8.000000 1061.618523
25% 19.597823 22.750000 184.250000 2812.847151
50% 51.239831 43.500000 392.500000 6006.352023
75% 77.198228 75.000000 704.250000 8253.976921
max 99.171329 100.000000 996.000000 9866.465458
Stock levels Lead times Order quantities Shipping times \
count 100.000000 100.000000 100.000000 100.000000
mean 47.770000 15.960000 49.220000 5.750000
std 31.369372 8.785801 26.784429 2.724283
min 0.000000 1.000000 1.000000 1.000000
25% 16.750000 8.000000 26.000000 3.750000
50% 47.500000 17.000000 52.000000 6.000000
75% 73.000000 24.000000 71.250000 8.000000
max 100.000000 30.000000 96.000000 10.000000
Shipping costs Lead time Production volumes \
count 100.000000 100.000000 100.000000
mean 5.548149 17.080000 567.840000
std 2.651376 8.846251 263.046861
min 1.013487 1.000000 104.000000
25% 3.540248 10.000000 352.000000
50% 5.320534 18.000000 568.500000
75% 7.601695 25.000000 797.000000
max 9.929816 30.000000 985.000000
Manufacturing lead time Manufacturing costs Defect rates Costs
count 100.00000 100.000000 100.000000 100.000000
mean 14.77000 47.266693 2.277158 529.245782
std 8.91243 28.982841 1.461366 258.301696
min 1.00000 1.085069 0.018608 103.916248
25% 7.00000 22.983299 1.009650 318.778455
50% 14.00000 45.905622 2.141863 520.430444
75% 23.00000 68.621026 3.563995 763.078231
max 30.00000 99.466109 4.939255 997.413450
2. 探索性分析
现在,让我们开始分析供应链,看看产品价格与产品收益之间的关系:
fig = px.scatter(data, x='Price',
y='Revenue generated',
color='Product type',
hover_data=['Number of products sold'],
trendline="ols")
fig.show()
因此,公司从护肤品中获得了更多收入,护肤品价格越高,收入越多。现在让我们来看看按产品类型划分的销售额:
sales_data = data.groupby('Product type')['Number of products sold'].sum().reset_index()
pie_chart = px.pie(sales_data, values='Number of products sold', names='Product type',
title='Sales by Product Type',
hover_data=['Number of products sold'],
hole=0.5,
color_discrete_sequence=px.colors.qualitative.Pastel)
pie_chart.update_traces(textposition='inside', textinfo='percent+label')
pie_chart.show()
因此,45% 的业务来自护肤产品,29.5% 来自护发产品,25.5% 来自化妆品。现在我们来看看运输公司的总收入:
total_revenue = data.groupby('Shipping carriers')['Revenue generated'].sum().reset_index()
fig = go.Figure()
fig.add_trace(go.Bar(x=total_revenue['Shipping carriers'],
y=total_revenue['Revenue generated']))
fig.update_layout(title='Total Revenue by Shipping Carrier',
xaxis_title='Shipping Carrier',
yaxis_title='Revenue Generated')
fig.show()
因此,该公司使用三家运输公司进行运输,而 B 运输公司帮助公司创造了更多收入。现在我们来看看公司所有产品的平均交付周期和平均制造成本:
avg_lead_time = data.groupby('Product type')['Lead time'].mean().reset_index()
avg_manufacturing_costs = data.groupby('Product type')['Manufacturing costs'].mean().reset_index()
result = pd.merge(avg_lead_time, avg_manufacturing_costs, on='Product type')
result.rename(columns={'Lead time': 'Average Lead Time', 'Manufacturing costs': 'Average Manufacturing Costs'}, inplace=True)
print(result)
Product type Average Lead Time Average Manufacturing Costs
0 cosmetics 13.538462 43.052740
1 haircare 18.705882 48.457993
2 skincare 18.000000 48.993157
3. 分析 SKU
数据集中有一列 SKUs。你一定是第一次听说。所以,SKU 是库存单位(Stock Keeping Units)的缩写。它们就像特殊的代码,可以帮助公司追踪所有不同的销售商品。想象一下,你有一家大型玩具店,里面有很多玩具。每个玩具都不一样,都有自己的名称和价格,但当你想知道还剩多少时,你需要一种方法来识别它们。因此,你要给每个玩具一个唯一的代码,就像一个只有店里人才知道的秘密编号。数据集中有一列称为 SKU。你一定是第一次听说这个词。所以,SKU 是库存单位(Stock Keeping Units)的缩写。它们就像特殊的代码,可以帮助公司跟踪所有不同的待售商品。想象一下,你有一家大型玩具店,里面有很多玩具。每个玩具都不一样,都有自己的名称和价格,但当你想知道还剩多少时,你需要一种方法来识别它们。因此,你要给每个玩具一个唯一的代码,就像一个只有店里人才知道的秘密编号。这个秘密编号叫做 SKU。
revenue_chart = px.line(data, x='SKU',
y='Revenue generated',
title='Revenue Generated by SKU')
revenue_chart.show()
数据集中还有一列是库存水平。库存水平指的是商店或企业库存产品的数量。现在我们来看看每个 SKU 的库存量:
stock_chart = px.line(data, x='SKU',
y='Stock levels',
title='Stock Levels by SKU')
stock_chart.show()
现在我们来看看每个 SKU 的订单数量:
order_quantity_chart = px.bar(data, x='SKU',
y='Order quantities',
title='Order Quantity by SKU')
order_quantity_chart.show()
4. 成本分析
现在我们来分析一下承运商的运费:
shipping_cost_chart = px.bar(data, x='Shipping carriers',
y='Shipping costs',
title='Shipping Costs by Carrier')
shipping_cost_chart.show()
在上述可视化图表中,我们发现运营商 B 帮助公司获得了更多收入。同时,它也是三家承运商中成本最高的一家。现在我们来看看按运输方式划分的成本分布:
transportation_chart = px.pie(data,
values='Costs',
names='Transportation modes',
title='Cost Distribution by Transportation Mode',
hole=0.5,
color_discrete_sequence=px.colors.qualitative.Pastel)
transportation_chart.show()
因此,公司在公路和铁路运输模式上的货物运输费用较高。
5. 分析缺陷率
供应链中的次品率是指在发货后发现有问题或破损的产品所占的百分比。让我们来看看所有类型产品的平均缺陷率:
defect_rates_by_product = data.groupby('Product type')['Defect rates'].mean().reset_index()
fig = px.bar(defect_rates_by_product, x='Product type', y='Defect rates',
title='Average Defect Rates by Product Type')
fig.show()
因此,护发产品的缺陷率较高。现在我们来看看按运输方式划分的不良率:
pivot_table = pd.pivot_table(data, values='Defect rates',
index=['Transportation modes'],
aggfunc='mean')
transportation_chart = px.pie(values=pivot_table["Defect rates"],
names=pivot_table.index,
title='Defect Rates by Transportation Mode',
hole=0.5,
color_discrete_sequence=px.colors.qualitative.Pastel)
transportation_chart.show()
公路运输的缺陷率较高,而航空运输的缺陷率最低。
供应链分析是指分析供应链的各个组成部分,以了解如何提高供应链的效率,为客户创造更多价值。希望您喜欢这篇关于使用 Python 进行供应链分析的文章。欢迎在下面的评论区提出宝贵的建议。