实验一：关联规则（见U盘）

实验名称 关联规则

实验时间	3月 14 日星期四第3.4节
实验目的	利用 Python 对关联规则算法进行调用。能够使用 Python 调用关联规则算法。首先使用apriori ,fpgrowth 或者 fpmax 函数来找出频繁项集，然后使用 association_rules 函数来找出关联规则，最后对关联规则做一些过滤操作。
实验环境	Anaconda3 jupyter notebook
实验内容（步骤、方法、算法、程序）	首先使用apriori ,fpgrowth 或者 fpmax 函数来找出频繁项集，然后使用 association_rules 函数来找出关联规则，最后对关联规则做一些过滤的操作。步骤 1 : 生成频繁项集 (1)首先创建一个由 fpgrowth 函数生成的频繁项集的DataFrame： (2)接下来，我们可以使用mlxtend 库中的association_rules 函数来找出关联规则。 (3)我们可以看到，关联规则包括了前件、后件、支持度、置信度、提升度等信息。例如，第一条规则表示如果顾客购买了芸豆，那么他们很可能也会购买鸡蛋，置信度为 0.8，提升度为1.0。步骤 2 : 过滤规则 (1)计算先行词长度。 (2)可以使用多重过滤选择。
部分源程序代码/实验配置
实验结果与结论	!pip install mlxtend import pandas as pd from mlxtend.frequent_patterns import apriori from mlxtend.frequent_patterns import association_rules import warnings warnings.filterwarnings('ignore') import pandas as pd from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth #假设我们有一个超市销售记录的数据集，其中每个记录包含了顾客购买的商品清单。我们想要找出哪些商品经常一起被购买。 dataset = [['牛奶', '洋葱', '香料', '芸豆', '鸡蛋', '酸奶'], ['菠萝', '洋葱', '香料', '芸豆', '鸡蛋', '酸奶'], ['牛奶', '苹果', '芸豆', '鸡蛋'], ['牛奶', '黄瓜', '玉米', '芸豆', '酸奶'], ['玉米', '洋葱', '芸豆', '冰淇淋', '鸡蛋']] #我们可以使用 mlxtend 库中的 fpgrowth 函数来找出频繁项集和关联规则。 #将数据集转换为布尔矩阵 te = TransactionEncoder() te_ary = te.fit(dataset).transform(dataset) df = pd.DataFrame(te_ary, columns=te.columns_) # 使用 fpgrowth 算法找出频繁项集 """ fpgrowth 算法参数解释： 1. transactions：指待挖掘的数据集，可以是一个 DataFrame 或者一个 List，其中每一行代表一个事务，每一列代表一个项。 2. min_support：指频繁项集的最小支持度，用来筛选频繁项集。默认值为 0.5。 3. use_colnames：指是否使用列名作为项集的元素，如果为 True，则返回结果中的项集元素为列名，否则为列的索引。默认值为 False。 4. max_len：指频繁项集中包含的最大项数，用来控制算法的搜索空间。默认值为 None，表示不限制最大项数。 5. verbose：指是否输出详细的运行日志，如果为 True，则输出运行过程中的详细信息，否则不输出。默认值为 0。 6. 返回值：返回一个 DataFrame，其中包含了所有满足最小支持度要求的频繁项集及其支持度。""" frequent_itemsets = fpgrowth(df, min_support=0.6, use_colnames=True) ### 也可以采用 apriori 和 fpmax 算法 #frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True) #frequent_itemsets = fpmax(df, min_support=0.6, use_colnames=True) # 打印频繁项集 print(frequent_itemsets) from mlxtend.frequent_patterns import association_rules # 找出关联规则 rules=association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7) # 打印关联规则 print(rules) rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.1) Rules rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.1) rules rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x)) rules rules[ (rules['antecedent_len'] >= 2) & (rules['confidence'] > 0.75) & (rules['lift'] > 1.2) ] rules[rules['antecedents'] == {'鸡蛋', '芸豆'}] antecedent_sele = rules['antecedents'] == frozenset({'洋葱', '芸豆'}) # or frozenset({'芸豆', '洋葱'}) consequent_sele = rules['consequents'] == frozenset({'鸡蛋'}) final_sele = (antecedent_sele & consequent_sele) rules.loc[ ~final_sele ]
实验心得与小结	我们在实验中学习了FP-Growth 关联，它利用 FP 树来存储频繁项集，从而减少搜索空间，提高搜索效率。除此之外，还可以将 FP-Growth 替换为Apriori 或者 FPMax，Apriori 它能够发现任意长度的频繁项集，并能够发现复杂的关联规则，FPMax 通过剪枝和过滤来减少搜索空间，从而提高了算法的效率，找到频繁模式的最大项集。
指导教师评议	成绩评定：指导教师签名：