Resource download address :https://download.csdn.net/download/sheziqiong/85795438
Resource download address :https://download.csdn.net/download/sheziqiong/85795438
Some websites url The composition contains encryption parameters , The encryption method of these parameters is written in JavaScript in , and js Usually compressed , Obfuscation and encryption ; The project passed AJAX The breakpoint ,hook Method first find the encryption entry , Secondly through playwright Mount the encryption method to window In the object , To use the browser environment to execute JavaScript Purpose of method , Thus, without knowing the specific encryption logic , Analog parameter encryption .
2. result
This is a AJAX request ,token Parameters are encryption parameters ,limit The parameter indicates the number of movies displayed per page ,offset According to observation , by ( The current number of pages -1)*10;
To observe token Construction , Let's play one AJAX The breakpoint , In a AJAX Ask for a moment to stop , Use the call stack to find token The generation entry of
View the call stack by , find token Generate entry ,
stay 169 Line break point , Refresh web page , And will _0x2fa7bd['a']
and this['$store']['state']['url']['index']
Add to watch Monitoring ,
Easy to find _0x2fa7bd[‘a’] It's a method , The parameter passed in is /api/movie
At this time there is 2 This path can take ,1 Is click chunk-4dec7ef0.e4c2b130.js:1
, Check the construction logic of this method , And use python Simulate the method , The second way is to direct _0x2fa7bd['a']
Mount to browser window In the object , Which method to choose depends on the complexity of the method , Here, we first click in to view the method ,
stay return Place a breakpoint , Refresh , Look at the change of variables
/api/movie
Construct a list ,
Concatenate this list to form a string , use SHA1
Encryption forms 40 individual 16 Hexadecimal number If you use python If you implement it, you need to import hashlib call sha1 Method , Import base64 call encode Method , It's more complicated , Here we go to mount window Object mode
Download the location where the encryption method is changed js file
vscode Open the file , Add the following sentence to the next line after the method call window.encrypt = Object(_0x2fa7bd['a']);
It means to change the method into window An attribute of an object
call playwright Library override js Load path
from playwright.sync_api import sync_playwright
browser = sync_playwright().start().chromium.launch()
page = browser.new_page()
page.route('https://spa6.scrape.center/js/chunk-19c920f8.c3a1129d.js',
lambda route: route.fulfill(path='./ project 6/chunk.js'))
page.route('https://spa6.scrape.center/js/chunk-4dec7ef0.e4c2b130.js',
lambda route: route.fulfill(path='./ project 6/chunk-id.js'))
self.page.goto(self.BASEURL)
therefore playwright The browser instance of has the ability to call the encryption method , Then we write a method , Let the browser execute this one JavaScript Method , Its function is to make the parameters passed in , Implement this encryption method and return the output
def get_token(self, params):
result = self.page.evaluate(
'()=>{return window.encrypt("%s")}' % params)
return result
So the list page url The construction is completed , Next use requests/aiohttp/scrapy And so on url that will do
Or a AJAX request , Yes 2 Encryption parameters ,2 All of them are 64 Bit string , Maybe they all used base64 Encryption method ,token The construction method of may be the same as that of the list page , The front one , Let's call it id Unclear
First investigate token Encryption entry for , See if the method is consistent with the list page , If the same , See what the encryption parameters are
It can be seen that token The encryption method is the same , But the encryption parameters are different ,/api/movie/ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIx
, stay /api/movie/ It is also spliced on the basis of id, Then the next direction is id The encryption construction method of , but id By looking at the call stack, we can't find , So another way , That's it base64 Encryption method , Well, it's probably using btoa Method , We can use tampermonkey
Plug in write a simple JavaScript Script , Conduct hook Capture
// ==UserScript==
// @name hookbase64 # Script name
// @namespace http://tampermonkey.net/
// @version 0.1
// @description try to take over the world!
// @author You
// @match https://spa6.scrape.center/ # Target website
// @icon https://www.google.com/s2/favicons?sz=64&domain=greasyfork.org
// @grant none
// ==/UserScript==
(function() {
'use strict';
function hook(object,attr){
var func=object[attr] # Let's define a method
object[attr]=function(){
# Then rewrite this method
console.log('hooked',object,attr) # The console outputs the invoked objects and properties
var ret=func.apply(object,arguments) # Call the first method
console.log('ret',ret) # The console outputs the returned results
debugger # a key : Enter debugging here
return ret # Return results
}}
hook(window,'btoa') # Instantiate this method , The object is window, The method is 'btao'
})();
Cancel all breakpoints on the page , Refresh
Strangely, it didn't trigger js Script run
What might be the reason ? It could be this id Not included on this page AJAX Generated in the request , It may be in the URL of the list page AJAX The request generates , Check it out
eureka , A parameter is a string of meaningless characters , By calling stack , Find its upper calling method step by step
Finally, we found the structure entrance , Enter to view its encryption method , by 2 String together , use base64 Encrypted into 64 Bit string , A string is fixed , The other is the passed in parameter , This parameter must be the unique identification code that distinguishes each step of the movie , Next, go to the list page JSON file
It can be inferred that ID Field is the parameter passed in
Then we will continue to use the list page method , Use this method with playwright Mount to window In the object , structure id Generate the method
def get_id(self, params):
result = self.page.evaluate(
'()=>{return window.encrypt_id("%s")}' % params)
return result
url After they are all constructed , Here's what happens .
Resource download address :https://download.csdn.net/download/sheziqiong/85795438
Resource download address :https://download.csdn.net/download/sheziqiong/85795438