您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

[2022] cuiqingcais python3 crawler tutorial - in depth learning to identify the gap of sliding verification code

編輯：Python

This is the third page of the reptile column 「28」 Original article

In the last section we used OpenCV The figure verification code shell is identified . At this time, a friend may say , Now deep learning is not very accurate for image recognition ？ Can deep learning be used to identify the notch position of sliding verification code ？

Of course, it can be , In this section, let's learn about the method of using deep learning to identify sliding verification codes .

1. preparation

similarly , This section mainly focuses on the process of identifying the verification code gap by using the deep learning model , Therefore, we will not focus on the algorithm of deep learning model , In addition, because the implementation of the whole model is complex , This section also does not write code from scratch , Instead, they tend to download the code in advance for practical practice .

So in the end , Please download the code in advance , Warehouse address is ：github.com/Python3WebS… Git Clone it ：

git clone https://github.com/Python3WebSpider/DeepLearningSlideCaptcha2.git

After running , There will be a local DeepLearningImageCaptcha2 Folder , It proves that cloning is successful .

After cloning , Please switch to DeepLearningImageCaptcha2 Folder , Install the necessary dependent Libraries ：

pip3 install -r requirements.txt

After running , All the dependent libraries required for the operation of the project are installed .

After the above preparations are completed , Then let's begin this section of formal study .

2. object detection

Identify the problem of sliding verification code gap , In fact, it can be attributed to the problem of target detection . What is target detection ？ Here is a brief introduction .

object detection , seeing the name of a thing one thinks of its function , It's about finding out what we're looking for . For example, give me a 「 Dog 」 Pictures of the , As shown in the figure ：

We want to know where the dog is , Where's its tongue , When you find them, select them , This is target detection .

After the target detection algorithm processing , The picture we expect to get is like this ：

You can see that the dog and its tongue are selected from the box , This completes a good target detection .

Now the more popular target detection algorithms are R-CNN、Fast R-CNN、Faster R-CNN、SSD、YOLO etc. , If you are interested, you can learn something about , Of course, not knowing much has no impact on the objectives to be achieved in this section .

At present, there are two main algorithms for target detection , There are one-stage and two-stage , English is called One stage and Two stage, The brief is as follows ：

Two Stage： The algorithm first generates a series of candidate boxes where the target is located , Then classify the results selected from these boxes , That is, find out where it is first , Then you can tell what it is , As the saying goes, it's called 「 Take a look 」, This algorithm has R-CNN、Fast R-CNN、Faster R-CNN etc. , The architecture of these algorithms is relatively complex , But there are advantages in accuracy .
One Stage： There is no need to generate candidate boxes , The problem of target location and classification is directly transformed into regression problem , As the saying goes, it's called 「 Take a look at 」, This algorithm has YOLO、SSD, Although the accuracy of these algorithms is not as good as Two stage, But the architecture is relatively simple , Faster detection .

So this time we choose One Stage A representative target detection algorithm YOLO To realize the identification of sliding verification code gap .

YOLO, The full English name is You Only Look Once, Taking their initials constitutes the algorithm name ,

at present YOLO The latest version of the algorithm is V5 edition , Widely used is V3 edition , Here, we will not introduce the specific flow of the algorithm more , If you are interested, you can search the relevant information to understand , In addition, you can also understand YOLO V1-V3 Differences and improvements in the version , Here are some reference links ：

YOLO V3 The paper ：pjreddie.com/media/files…
YOLO V3 Introduce ：zhuanlan.zhihu.com/p/34997279
YOLO V1-V3 Compare and introduce ：www.cnblogs.com/makefile/p/…

3. Data preparation

As described in the previous section , To train the deep learning model, you also need to prepare training data , The data is also divided into two parts , One part is the verification code image , The other part is data annotation , That is, the location of the gap . But unlike the previous section , This annotation is no longer a simple verification code text , Because this time we need to show the position of the gap , The notch corresponds to a rectangular box , To represent a rectangular box , At least four data are required , Such as the abscissa and ordinate of the upper left corner x、y, The width and height of the rectangle w、h, So the labeled data becomes four numbers .

therefore , Next, we need to prepare some verification code pictures and corresponding four digit labels , For example, the sliding verification code in the figure below ：

good , Then let's finish these two steps , The first step is to collect verification code pictures , The second step is to mark the position of the gap and turn it into the four digits we want .

Our sample website here is captcha1.scrape.center/, Click the login button after opening …

What we need to do is save the image of the sliding verification code separately , This is the area ：

How to do it? ？ Manual screenshots are certainly unreliable , laborious , And it's hard to accurately locate the boundary , It will lead to the size of the saved pictures . To solve this problem , We can simply write a script to realize automatic cutting and saving , It's... In the warehouse collect.py file , The code is as follows ：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import WebDriverException
import time
from loguru import logger
COUNT = 1000
for i in range(1, COUNT + 1):
try:
browser = webdriver.Chrome()
wait = WebDriverWait(browser, 10)
browser.get('https://captcha1.scrape.center/')
button = wait.until(EC.element_to_be_clickable(
(By.CSS_SELECTOR, '.el-button')))
button.click()
captcha = wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, '.geetest_slicebg.geetest_absolute')))
time.sleep(5)
captcha.screenshot(f'data/captcha/images/captcha_{i}.png')
except WebDriverException as e:
logger.error(f'webdriver error occurred {e.msg}')
finally:
browser.close()

Here we first define a loop , The number of cycles is COUNT Time , Use... Every time you cycle Selenium Launch a browser , Then open the target website , Simulate clicking the login button to trigger the verification code pop-up , Then intercept the node corresponding to the verification code , Reuse screenshot Method to save it .

We run it ：

python3 collect.py

After running, we can data/captcha/images/ The directory has obtained many verification code pictures , The example is shown in the figure ：

After obtaining the verification code picture , We need to label the data , The recommended tool here is labelImg,GitHub The address is github.com/tzutalin/la… pip3 Can be installed ：

pip3 install labelImg

After installation, you can run it directly from the command line ：

labelImg

This successfully starts labelImg：

Click on Open Dir open data/captcha/images/ Catalog , Then click on the Create RectBox Create a dimension box , We can select the rectangle where the gap is , After the box is selected labelImg You will be prompted to save a name , We named it target, And then click OK, As shown in the figure ：

At this time, we can find that it saves a xml file , The contents are as follows ：

<annotation>
<folder>images</folder>
<filename>captcha_0.png</filename>
<path>data/captcha/images/captcha_0.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>520</width>
<height>320</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>target</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>321</xmin>
<ymin>87</ymin>
<xmax>407</xmax>
<ymax>167</ymax>
</bndbox>
</object>
</annotation>

You can see size There are three nodes in the node , Namely width、height、depth, Respectively represent the width of the original verification code picture 、 Height 、 The channel number . in addition object Node under bndbox The node contains the location of the marking gap , Through observation and comparison, we can know xmin、ymin It refers to the coordinates of the upper left corner ,xmax、ymax It refers to the coordinates in the lower right corner .

We can simply process the data with the following methods ：

import xmltodict
import json
def parse_xml(file):
xml_str = open(file, encoding='utf-8').read()
data = xmltodict.parse(xml_str)
data = json.loads(json.dumps(data))
annoatation = data.get('annotation')
width = int(annoatation.get('size').get('width'))
height = int(annoatation.get('size').get('height'))
bndbox = annoatation.get('object').get('bndbox')
box_xmin = int(bndbox.get('xmin'))
box_xmax = int(bndbox.get('xmax'))
box_ymin = int(bndbox.get('ymin'))
box_ymax = int(bndbox.get('ymax'))
box_width = (box_xmax - box_xmin) / width
box_height = (box_ymax - box_ymin) / height
return box_xmin / width, box_ymin / height, box_width / width, box_height / height

Here we define a parse_xml Method , This method first reads xml file , And then use xmltodict The library can put XML String to JSON, Then read out the width and height information of the verification code in turn , Information on the location of the notch , Finally, the desired data format is returned —— Coordinates and relative values of width and height of the upper left corner of the notch , Returns... As a tuple .

After all the marks are completed , For each xml File call this method to generate the desired annotation results .

ad locum , I have processed the corresponding annotation results , You can use it directly , Path is data/captcha/labels, As shown in the figure ：

Every txt The document corresponds to the marking result of a verification code diagram , The content is similar to the following ：

0 0.6153846153846154 0.275 0.16596774 0.24170968

first place 0 Represents the index of the label target , Because we only need to detect one gap , So the index is 0; The first 2、3 Bit represents the position of the upper left corner of the notch , such as 0.615 It represents that the abscissa of the upper left corner of the notch is in the relative position of the verification code 61.5% It's about , Multiply by the width of the verification code 520, The result is about 320, The upper left corner is the offset 320 Pixels ; The first 4、5 The ratio of the width and height of the notch to the picture of the verification code , For example 5 position 0.24 Multiply by the height of the verification code 320, The result is about 77, That is, the height of the notch is about 77 Pixels .

Okay , So far, the data preparation phase is completed .

4. Training

For better training effect , We also need to download some pre training models . Pre training means that there is already a basic model trained in advance , We can directly use the weight file in the model trained in advance , We don't have to train from scratch , Just fine tune based on the previous model , This can save training time , It can also have a better effect .

YOLOV3 We need to load the pre training model in order to have a good training effect , The pre training model download command is as follows ：

bash prepare.sh

Be careful ： stay Windows Please use the Bash Command line tools such as Git Bash To run this command .

Execute this script , You can download YOLO V3 Some weight files of the model , Include yolov3 and weights also darknet Of weights, Before training, we need to initialize with these weight files YOLO V3 Model .

Then you can start training , Execute the following script ：

bash train.sh

Be careful ： stay Windows Please use the same Bash Command line tools such as Git Bash To run this command .

It is also recommended to use GPU Training , During training, we can use TensorBoard Let's see loss and mAP The change of , function TensorBoard：

tensorboard --logdir='logs' --port=6006 --host 0.0.0.0

Be careful ： Please ensure that all dependent libraries of this project have been installed correctly , Among them is TensorBoard, After successful installation, you can use tensorboard command .

After running this command, you can http://localhost:6006 Observed during training loss change .

loss_1 The changes are similar to the following ：

val_mAP The changes are similar to the following ：

You can see loss From the initial very high to very low , The accuracy is gradually approaching 100%.

Here are some output results of the command line during the training ：

---- [Epoch 99/100, Batch 27/29] ----
+------------+--------------+--------------+--------------+
| Metrics | YOLO Layer 0 | YOLO Layer 1 | YOLO Layer 2 |
+------------+--------------+--------------+--------------+
| grid_size | 14 | 28 | 56 |
| loss | 0.028268 | 0.046053 | 0.043745 |
| x | 0.002108 | 0.005267 | 0.008111 |
| y | 0.004561 | 0.002016 | 0.009047 |
| w | 0.001284 | 0.004618 | 0.000207 |
| h | 0.000594 | 0.000528 | 0.000946 |
| conf | 0.019700 | 0.033624 | 0.025432 |
| cls | 0.000022 | 0.000001 | 0.000002 |
| cls_acc | 100.00% | 100.00% | 100.00% |
| recall50 | 1.000000 | 1.000000 | 1.000000 |
| recall75 | 1.000000 | 1.000000 | 1.000000 |
| precision | 1.000000 | 0.800000 | 0.666667 |
| conf_obj | 0.994271 | 0.999249 | 0.997762 |
| conf_noobj | 0.000126 | 0.000158 | 0.000140 |
+------------+--------------+--------------+--------------+
Total loss 0.11806630343198776

Here is the change of each index in the training process , Such as loss、recall、precision、confidence etc. , Each represents the loss of the training process （ The smaller the better. ）、 Recall rate （ The proportion of the results that can be recognized in the results that should be recognized , The higher, the better ）、 Accuracy （ The correct ratio of the identified results , The higher, the better ）、 Degree of confidence （ The model is sure of the probability of identifying pairs , The higher, the better ）, As a reference .

5. test

After training, I will be in checkpoints Folder generation pth file , These are some model files , And the last one best_model.pkl It's the same principle , It's just a little different , We can directly use these models to predict and generate annotation results .

To run tests , We can start with the test folder data/captcha/test Put in some verification code pictures ：

The sample verification code is as follows ：

To run tests , Execute the following script ：

bash detect.sh

The script will read all the pictures in the test folder , And output the processed results to data/captcha/result Folder , The console outputs the identification results of some verification codes .

At the same time data/captcha/result Generated the result of annotation , A sample of ：

You can see , The gap is accurately identified .

actually ,detect.sh It's execution detect.py file , There is a key output in the code as follows ：

bbox = patches.Rectangle((x1 + box_w / 2, y1 + box_h / 2), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none")
print('bbox', (x1, y1, box_w, box_h), 'offset', x1)

here bbox The position of the gap is the final contour , meanwhile x1 It refers to the lateral offset between the leftmost side of the contour and the leftmost side of the whole verification code , namely offset. Through these two messages , We can get the key position of the gap .

With the target slider position , Then we can carry out some simulated sliding operations to realize the detection through the verification code .

6. summary

This section mainly introduces the overall process of training deep learning model to identify the gap of sliding verification code , Finally, we successfully realized the model training process , And get a deep learning model file .

Using this model , We can enter a sliding verification code , The model will predict the location of the gap , Including offset 、 Width etc. , Finally, the corresponding position can be drawn through the information of the gap .

Of course, the content introduced in this section can also be further optimized ：

The prediction process of the current model is executed through the command line , But it may not be very convenient in actual use , Consider docking the prediction process API The server is exposed , Such as docking Flask、Django、FastAPI The prediction process is implemented as a support POST Requested interface , The interface can receive a verification code picture , Return the text information of the verification code , This will make the model more convenient and easy to use .

Code in this section ：github.com/Python3WebS…

Thank you very much for reading , More highlights , Please pay attention to my official account 「 Attacking Coder」 and 「 Cui Qingcai is looking for 」.