您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python implements collaborative filtering based on neighborhood algorithm (movie recommendation practice)

編輯：Python

1 Introduce

1.1 User behavior data

User behavior data usually includes ： Web browsing 、 Purchase Click 、 Ratings and reviews, etc .

User behavior is generally divided into two types in personalized recommendation system ：

Explicit feedback behavior （explicit feedback）
Include ： Users clearly express their preference for items , The way to collect explicit feedback on the website is to score / Don't like .
Invisible feedback behavior （implicit feedback）
Refers to behaviors that do not clearly reflect user preferences , The most representative invisible feedback behavior is page browsing behavior .

Install the feedback direction switch , Can be divided into The positive feedback and Negative feedback

The positive feedback
The user's behavior tends to mean that the user likes the item
Negative feedback
The user's behavior tends to mean that the user doesn't like the item .

In explicit feedback behavior , It's easy to tell whether a user's behavior is positive feedback or negative feedback , And in invisible feedback behavior , It is relatively difficult to determine .

1.2 User behavior analysis

Before designing recommendation algorithms using user behavior data , Researchers need to analyze user behavior data , Understand the general rules contained in the data , Only in this way can we play a guiding role in the design of calculation .

User activity and item popularity
The relationship between user activity and item popularity .

There are many ways to coordinate laws ：

Domain based approach （neighborhood-based）（ best-known ）
The model of implicit meaning （latent factor model）
Random walk graph based algorithm （radom walk on graph）

Domain based approach （neighborhood-based）

Collaborative filtering algorithm based on users ： Recommend other users' favorite items with similar interests to users .
Collaborative filtering algorithm based on items ： Recommend items similar to his previous favorite Wuping to users .

2 Domain based algorithm

Domain based algorithm is the most basic algorithm in recommendation system , This algorithm has not only been deeply studied in academic circles , And it has been widely used in the industry .

2.2 Collaborative filtering algorithm based on users （UserCF）

The basic idea ：
In an online personalized recommendation system , When a user A When you need a personalized recommendation , You can find other users with similar interests first , And then put those users like 、 And users A Items not recommended to A.

step ：

（1） Find a set of users with similar interests to the target user .
（2） Find what users in this collection like , And the target user has heard of items recommended to the target user .

2.3 Collaborative filtering algorithm based on items （ItemCF）

ItemCollaborationFilter

The core ：
Recommend items that are similar to their previous favorite items .

Main steps ：

（1） Calculate the similarity between items ;

（2） According to the similarity of items and the historical behavior of users, we can generate recommendation list for users ;

3 ItemCF Movie recommendation

3.1 Calculate the similarity matrix of the film & Calculate item similarity matrix w

3.1.1 Calculate the similarity matrix of the film

principle ： The connection between movies seen by users

user A： Watching the film film1 and film2, be film1 And film2 The relationship value is 1.

user B： I've seen The movie film1 and film2, Then the relationship value +1

And so on .

3.1.2 Calculate the similarity between films

give an example ：
（1） user A Yes a、b、d Had behavior , user B For items b、c、e Had behavior ...

A：a、b、d
B：b、c、e
C：c、d
D：b、c、d
E：a、d

（2） Build users in turn — Items to the list ：
eg. goods a By user A、E Had behavior ,...

a：A、E
b：A、B、D
c：B、C、D
d：A、C、D、E
d：B

（3） Establish item similarity matrix C

among ,C[i][j] Record the same thing you like i And objects j Number of users , In this way, we can get the similarity matrix between items W.

3.1.3 Code

# Calculate the similarity between films 
def calc_movie_sim(self):
print('=' * 100)
print(' Two 、 Calculate the similarity matrix of the film ......')
# establish movies_popular Dictionaries 
print('-' * 35 + '1. A dictionary for calculating the popularity of movies movie——popular...' + '-' * 26)
for user, movies in self.trainSet.items():
for movie in movies:
""" If so movie Not in movies_popular In the dictionary , Then insert it into the dictionary and assign it as 0, otherwise +1, The final movie_popular The dictionary key is the movie name , The value is the total number of views of all users """
if movie not in self.movie_popular:
self.movie_popular[movie] = 0
else:
self.movie_popular[movie] += 1
self.movie_count = len(self.movie_popular)
# print(self.movie_popular)
print(" Total number of films in training focus = %d" % self.movie_count)
print('-' * 35 + '2. Build a movie connection matrix ... ' + '-' * 43)
for user, movies in self.trainSet.items():
for m1 in movies:
for m2 in movies:
if m1 == m2:
continue
""" The next three steps are ： Set the connection value between each movie seen by each user and all other movies respectively 1, If another user watches two movies at the same time , be +1 """
self.movie_sim_matrix.setdefault(m1, {
})
self.movie_sim_matrix[m1].setdefault(m2, 0)
self.movie_sim_matrix[m1][m2] += 1
print(" Build the similarity matrix of the film successfully ！")
# print(" Before matrix similarity calculation movieId=1 An act of ：")
# print(self.movie_sim_matrix['1']) 
# Calculate the similarity between films 
print('-' * 35 + '3. Calculate the final similarity matrix ... ' + '-' * 40)
for m1, related_movies in self.movie_sim_matrix.items():
for m2, count in related_movies.items():
# Be careful 0 Vector processing , That is, the number of users of a movie is 0
if self.movie_popular[m1] == 0 or self.movie_popular[m2] == 0:
self.movie_sim_matrix[m1][m2] = 0
else:
self.movie_sim_matrix[m1][m2] = count / math.sqrt(self.movie_popular[m1] * self.movie_popular[m2])
print(' Calculate the similarity matrix of the film successfully ！')

3.3 forecast

Computing users u Which one to shoot outside j The interest of ：
According to the similarity of items and the historical behavior of users, we can generate recommendation list for users

Puj： Represent user u For items j The interest of .
N(u)： Represents a collection of items that users like （i： An item that the user likes ）.
S(i, k)： Presentation and items i The most similar k A collection of items （ j It's an item in this collection ）.
Wji： Indicates an item j and i The similarity .
Rui： Represent user u For items i The interest of .

The result of the calculation is ： Items that are more similar to items of interest to users in history , The more likely you are to get a high ranking .

def recommend(self, user):
K = int(self.n_sim_movie)
N = int(self.n_rec_movie)
rank = {
}
watched_movies = self.trainSet[user]
for movie, rating in watched_movies.items():
""" For every movie the target user has seen , From the similar movie matrix, take the top... With the largest correlation value with this movie K movie , If this K The movie users haven't seen before , Add it to rank In the dictionary , The key for movieid name , Its value （ That is, the degree of recommendation ） by w（ The value of the similar movie Matrix ） And rating（ The rating of each movie given by the user ） The product of the """
for related_movie, w in sorted(self.movie_sim_matrix[movie].items(), key=itemgetter(1), reverse=True)[:K]:
if related_movie in watched_movies:
continue
rank.setdefault(related_movie, 0)
# Calculate recommendation 
rank[related_movie] += w * float(rating)
return sorted(rank.items(), key=itemgetter(1), reverse=True)[:N]

4 UserCF and itemCF Comparison

User On behalf of the website ： News website
ItemCF On behalf of the website ： The book 、 Online retailers 、 The movie

In principle ：
UserCF Recommend items to users who have common interests and are well liked by users .
ItemCF Recommend items similar to his favorite items to users .

From the point of view of principle ,UserCF The recommendation is more social , It reflects the popularity of items in the user's small interest group , and ItemCF 's recommendations are more personalized , It reflects the user's own interest inheritance .

UserCF You can recommend to users a group of tower users who have similar hobbies to him. They are watching the news today , In this way, while grasping Ren store and timeliness , Guaranteed a certain degree of personalization . meanwhile , In news websites , The update speed of items is much faster than that of new users , But for new users , You can recommend the hottest news , therefore UserCF Greater profit .

But in books 、 In the e-commerce website , The interests of users are fixed and persistent . Technicians often buy professional books , But many high-quality data are not popular books , therefore ItemCF The algorithm is very suitable for .
Update once a day , There will be less pressure on the website , however , Need to maintain the similarity matrix of items , Need more storage space .

User-basedItem-based performance Suitable for fewer users , Otherwise, the cost of computing user similarity is high It is applicable to the situation where the number of users in Wuping is significantly less than the number of users , conversely , Calculating the similarity of items is expensive field Strong timeliness , Areas where users' personalized interests are less obvious Areas where users have strong personalized needs Cold start After a new user acts on a few items , Can't establish a personalized recommendation for her , Because the user similarity table is calculated offline at regular intervals , Some time after the new item goes online , Once a user acts on an item , You can recommend new items to other users with similar interests to the users she generates behavior New users only have to act on an item , You can recommend other items related to the item , But there is no way to recommend new items to users without updating the item similarity table offline Recommended reasons It's hard to provide convincing recommendation explanations Use the user's historical behavior to recommend and explain , It can convince users

Reference address ：
https://www.jianshu.com/p/a21944550656

https://blog.csdn.net/qq_40965177/article/details/106636012

https://blog.csdn.net/qq_35704904/article/details/103031962

https://blog.csdn.net/yeruby/article/details/44154009