您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Implementation of yolo-v3 real-time detection (opencv+python Implementation)

編輯：Python

List of articles

- 1. Pre knowledge points （ Shallow understanding ）
- - （1） Network model of deep learning
  - （2）yolo-v3 Network structure
- 2.YOLO-V3 Weight file (.weights), Category file （.names） And network files （.cfg） download
- - （1）YOLOV3 Weight file download
  - （2）YOLOV3 Category file download
  - （3）YOLO.cfg Configuration file download
- 3. Code combat
- - （1） Read the weight file and network configuration file
  - （2） Get the names of the last three output layers
  - （3） Read contains 80 Categories coco.names The file of
  - （4） Setting of confidence threshold and non maximum suppression threshold
  - （5） Forward inference and prediction process
  - - （1） Image preprocessing , Set network input and forward inference
    - （2） Set the list for storing subsequent values
    - （3） Get coordinates , Degree of confidence , Probability values and forecast categories
    - （4） Solve the non maximum suppression result
    - （5） Draw rectangle
  - （6） Predict a single picture
  - （7） real-time detection
  - （8） The overall code
  - （9） Single picture prediction results
  - （10） Real time detection results

1. Pre knowledge points （ Shallow understanding ）

（1） Network model of deep learning

https://mydreamambitious.blog.csdn.net/article/details/125459959

（2）yolo-v3 Network structure

For understanding the current article （YOLO-V3 Real time detection implementation ）, The reader only needs to understand the knowledge points given below , that will do , Don't care too much about the details in the paper .

Source of table ：https://blog.csdn.net/qq_37541097/article/details/81214953

picture source ： Picture address

2.YOLO-V3 Weight file (.weights), Category file （.names） And network files （.cfg） download

（1）YOLOV3 Weight file download

https://pjreddie.com/darknet/yolo/

（2）YOLOV3 Category file download

https://github.com/pjreddie/darknet/blob/master/data/coco.names

（3）YOLO.cfg Configuration file download

https://github.com/pjreddie/darknet

Download as follows ：

3. Code combat

YOLO-V3 The result form of the output ; Know the form of the output result , For the coordinates of the prediction box that obtains the prediction results later (x,y), Height and width (w,h), Degree of confidence (confidence) And the probability of predicting categories is very helpful （ These values are the final output , So you need to map back to the original image ）.

（1） Read the weight file and network configuration file

# Read network configuration file and weight file
net=cv2.dnn.readNet(model='dnn_model/yolov3.weights',
config='dnn_model/yolov3.cfg')

（2） Get the names of the last three output layers

# from yolo-v3 The structure of , Finally, there are three scales of output
layerName=net.getLayerNames()
# Store the three scale names of the output , Used later for forward inference
ThreeOutput_layers_name=[]
for i in net.getUnconnectedOutLayers():
ThreeOutput_layers_name.append(layerName[i-1])

（3） Read contains 80 Categories coco.names The file of

# because yolo-v3 Contains 80 Categories , So first get the category
with open('dnn_model/coco.names','r') as fp:
classes=fp.read().splitlines()

（4） Setting of confidence threshold and non maximum suppression threshold

# Specify the confidence threshold for filtering ：confidence
Confidence_thresh=0.2
# Specify a value for non maximum suppression ： Filter the candidate boxes
Nms_thresh=0.35

（5） Forward inference and prediction process

（1） Image preprocessing , Set network input and forward inference

 # Parameters ： Images , normalization , Scaled size , Whether the RGB Subtract a constant ,R and B In exchange for （ because R and B It's the opposite , So we need to exchange ）, Crop or not
blob = cv2.dnn.blobFromImage(frame, 1 / 255, (416, 416), (0, 0, 0), swapRB=True, crop=False)
# Get the height and width of the image
height,width,channel=frame.shape
# Set up network input
net.setInput(blob)
# Make forward inference : The last three scale output layers are used as forward inference
predict=net.forward(ThreeOutput_layers_name)

（2） Set the list for storing subsequent values

 # Store the coordinates of the prediction box
boxes = []
# There is confidence in the predicted object
confid_object=[]
# The category in which the forecast is stored
class_prob=[]
# Storing predicted objects id
class_id=[]
# The name of the forecast category
class_names=[]

（3） Get coordinates , Degree of confidence , Probability values and forecast categories

 # According to the output, there are three scales , So we traverse the three scales respectively
for scale in predict:
for box in scale:
# Get coordinate values and height and width
# First, get the coordinates of the center of the rectangle （ Here you need to map back to the original graph ）
center_x=int(box[0]*width)
center_y=int(box[1]*height)
# Calculate the height and width of the box
w=int(box[2]*width)
h=int(box[3]*height)
# Get the coordinates of the upper left corner of the rectangle
left_x=int(center_x-w/2)
left_y=int(center_y-h/2)
boxes.append([left_x,left_y,w,h])
# Get the confidence of the detected object
confid_object.append(float(box[4]))
# Get the maximum probability
# First, obtain the subscript of the probability of the highest value
index=np.argmax(box[5:])
class_id.append(index)
class_names.append(classes[index])
class_prob.append(box[index])
confidences=np.array(class_prob)*np.array(confid_object)

（4） Solve the non maximum suppression result

 # Calculate the non maximum suppression
all_index=cv2.dnn.NMSBoxes(boxes,confidences,Confidence_thresh,Nms_thresh)

（5） Draw rectangle

# Traverse , Draw rectangle
for i in all_index.flatten():
x,y,w,h=boxes[i]
# rounding , Retain 2 Decimal place
confidence=str(round(confidences[i],2))
# Draw rectangle
cv2.rectangle(img=frame,pt1=(x,y),pt2=(x+w,y+h),
color=(0,255,0),thickness=2)
text=class_names[i]+' '+confidence
cv2.putText(img=frame,text=text,org=(x,y-10),
fontFace=cv2.FONT_HERSHEY_SIMPLEX,
fontScale=1.0,color=(0,0,255),thickness=2)

（6） Predict a single picture

# Single picture detection
def signa_Picture(image_path='images/smile.jpg'):
img=cv2.imread(image_path)
img=cv2.resize(src=img,dsize=(416,416))
dst=Forward_Predict(img)
cv2.imshow('detect',dst)
key=cv2.waitKey(0)
if key==27:
exit()

（7） real-time detection

# Real time detection
def detect_time():
cap=cv2.VideoCapture(0)
while cap.isOpened():
OK,frame=cap.read()
if not OK:
break
# Flip the picture once , because Opencv The picture read is opposite to our normal
frame=cv2.flip(src=frame,flipCode=2)
frame=cv2.resize(src=frame,dsize=(416,416))
dst=Forward_Predict(frame)
cv2.imshow('detect',dst)
key=cv2.waitKey(1)
if key==27:
break
cap.release()

（8） The overall code

import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Read network configuration file and weight file
net=cv2.dnn.readNet(model='dnn_model/yolov3.weights',
config='dnn_model/yolov3.cfg')
# from yolo-v3 The structure of , Finally, there are three scales of output
layerName=net.getLayerNames()
# Store the three scale names of the output , Used later for forward inference
ThreeOutput_layers_name=[]
for i in net.getUnconnectedOutLayers():
ThreeOutput_layers_name.append(layerName[i-1])
# because yolo-v3 Contains 80 Categories , So first get the category
with open('dnn_model/coco.names','r') as fp:
classes=fp.read().splitlines()
# Specify the confidence threshold for filtering ：confidence
Confidence_thresh=0.2
# Specify a value for non maximum suppression ： Filter the candidate boxes
Nms_thresh=0.35
# The detection process has been graphically drawn
def Forward_Predict(frame):
# Parameters ： Images , normalization , Scaled size , Whether the RGB Subtract a constant ,R and B In exchange for （ because R and B It's the opposite , So we need to exchange ）, Crop or not
blob = cv2.dnn.blobFromImage(frame, 1 / 255, (416, 416), (0, 0, 0), swapRB=True, crop=False)
# Get the height and width of the image
height,width,channel=frame.shape
# Set up network input
net.setInput(blob)
# Make forward inference : The last three scale output layers are used as forward inference
predict=net.forward(ThreeOutput_layers_name)
# Store the coordinates of the prediction box
boxes = []
# There is confidence in the predicted object
confid_object=[]
# The category in which the forecast is stored
class_prob=[]
# Storing predicted objects id
class_id=[]
# The name of the forecast category
class_names=[]
# According to the output, there are three scales , So we traverse the three scales respectively
for scale in predict:
for box in scale:
# Get coordinate values and height and width
# First, get the coordinates of the center of the rectangle （ Here you need to map back to the original graph ）
center_x=int(box[0]*width)
center_y=int(box[1]*height)
# Calculate the height and width of the box
w=int(box[2]*width)
h=int(box[3]*height)
# Get the coordinates of the upper left corner of the rectangle
left_x=int(center_x-w/2)
left_y=int(center_y-h/2)
boxes.append([left_x,left_y,w,h])
# Get the confidence of the detected object
confid_object.append(float(box[4]))
# Get the maximum probability
# First, obtain the subscript of the probability of the highest value
index=np.argmax(box[5:])
class_id.append(index)
class_names.append(classes[index])
class_prob.append(box[index])
confidences=np.array(class_prob)*np.array(confid_object)
# Calculate the non maximum suppression
all_index=cv2.dnn.NMSBoxes(boxes,confidences,Confidence_thresh,Nms_thresh)
# Traverse , Draw rectangle
for i in all_index.flatten():
x,y,w,h=boxes[i]
# rounding , Retain 2 Decimal place
confidence=str(round(confidences[i],2))
# Draw rectangle
cv2.rectangle(img=frame,pt1=(x,y),pt2=(x+w,y+h),
color=(0,255,0),thickness=2)
text=class_names[i]+' '+confidence
cv2.putText(img=frame,text=text,org=(x,y-10),
fontFace=cv2.FONT_HERSHEY_SIMPLEX,
fontScale=1.0,color=(0,0,255),thickness=2)
return frame
# Real time detection
def detect_time():
cap=cv2.VideoCapture(0)
while cap.isOpened():
OK,frame=cap.read()
if not OK:
break
# Flip the picture once , because Opencv The picture read is opposite to our normal
frame=cv2.flip(src=frame,flipCode=2)
frame=cv2.resize(src=frame,dsize=(416,416))
dst=Forward_Predict(frame)
cv2.imshow('detect',dst)
key=cv2.waitKey(1)
if key==27:
break
cap.release()
# Single picture detection
def signa_Picture(image_path='images/smile.jpg'):
img=cv2.imread(image_path)
img=cv2.resize(src=img,dsize=(416,416))
dst=Forward_Predict(img)
cv2.imshow('detect',dst)
key=cv2.waitKey(0)
if key==27:
exit()
cv2.destroyAllWindows()
if __name__ == '__main__':
print('Pycharm')
# signa_Picture()
detect_time()