程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Duplicate checking of the same image based on Python and MD5

編輯:Python

Internet pictures obtained by crawlers , Some of them are repeated after downloading , The cost of eye check is laborious , And it's hard to find out . This paper calculates and compares the md5 Value to determine whether it is a duplicate picture , For later use .

MD5 Information digest algorithm ( English :MD5 Message-Digest Algorithm), A widely used Cryptographic hash function , I can produce one 128 position (16 byte ) Hash value (hash value), Used to ensure complete and consistent transmission of information .

python The code is as follows :

import os
import shutil
import hashlib
# Calculate the of each image md5 value
def compute_md5(image_path):
img = open(image_path, 'rb')
md5 = hashlib.md5(img.read())
img.close()
md5_values = md5.hexdigest()
return md5_values
# Storage md5 It's worth it list
md5_list = []
# Path to store duplicate pictures
result_dir = "results"
os.makedirs(result_dir, exist_ok=True)
# Path of duplicate image to be checked
image_dir = "images"
image_list = os.listdir(image_dir)
for image_name in image_list:
image_path = os.path.join(image_dir, image_name)
md5 = compute_md5(image_path)
# If md5 Value already exists , Then move the picture to result_dir Under the table of contents
if md5 not in md5_list:
md5_list.append(md5)
else:
print(image_name)
save_path = os.path.join(result_dir, image_name)
shutil.move(image_path, save_path)

The above code only provides the duplicate checking function of completely repeated pictures , For similar pictures, it does not have the function of duplicate checking , Subsequently, similarity calculation or feature point matching can be added to realize duplicate checking of similar pictures .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved