程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Is there something wrong with the Apriori algorithm code in Chapter 4 of Introduction and practice of Python data mining?

編輯:Python

This is a apriori Part of the algorithm code . We want to start with only 1 The frequent itemsets of items are included 2 Frequent itemsets of items . The code is as follows :

from collections import defaultdict\n",
"\n",
"def find_frequent_itemsets(favorable_reviews_by_users, k_1_itemsets, min_support):\n",
" counts = defaultdict(int)\n",
" for user, reviews in favorable_reviews_by_users.items():\n",
" for itemset in k_1_itemsets:\n",
" if itemset.issubset(reviews):\n",
" for other_reviewed_movie in reviews - itemset:\n",
" current_superset = itemset | frozenset((other_reviewed_movie,))\n",
" counts[current_superset] += 1\n",
" return dict([(itemset, frequency) for itemset, frequency in counts.items() if frequency >= min_support])"

I think the frequent itemsets here are recalculated . for example : For users 1 Come on , aggregate {A,B} and {B,A} It's the same , But according to the code :

for itemset in k_1_itemsets:\n",
" if itemset.issubset(reviews):\n",
" for other_reviewed_movie in reviews - itemset:\n",
" current_superset = itemset | frozenset((other_reviewed_movie,))\n",
" counts[current_superset] += 1\n",

When itemset==A, We are right. {A,B} Count once ,
When itemset==B, We are right. {B,A} Count again ,
So here is not repeated counting ? If it is , How to modify the program ?



  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved