這是apriori算法代碼的一部分。我們想從只包含1項的頻繁項集出發得到包含2項的頻繁項集。代碼如下:
from collections import defaultdict\n", "\n", "def find_frequent_itemsets(favorable_reviews_by_users, k_1_itemsets, min_support):\n", " counts = defaultdict(int)\n", " for user, reviews in favorable_reviews_by_users.items():\n", " for itemset in k_1_itemsets:\n", " if itemset.issubset(reviews):\n", " for other_reviewed_movie in reviews - itemset:\n", " current_superset = itemset | frozenset((other_reviewed_movie,))\n", " counts[current_superset] += 1\n", " return dict([(itemset, frequency) for itemset, frequency in counts.items() if frequency >= min_support])"
我認為這裡的頻繁項集被重復計算了。例如:對用戶1來說,集合{A,B}和{B,A}是相同的,但是根據代碼:
for itemset in k_1_itemsets:\n", " if itemset.issubset(reviews):\n", " for other_reviewed_movie in reviews - itemset:\n", " current_superset = itemset | frozenset((other_reviewed_movie,))\n", " counts[current_superset] += 1\n",
當 itemset==A, 我們對{A,B}計數一次,
當 itemset==B, 我們對{B,A}又計數一次,
所以這裡是不是重復計數了?如果是,應該怎樣修改程序呢?