Before , I have summed up some about selenium Common operations of . I thought this was all about it , However I am wrong! This thing is much more powerful than I thought .
The knowledge brought in this issue mainly includes :
you 're right , These are the things , Every question makes me almost give up . Because what you find is the same , But it doesn't work . If you have trouble with the above problems , This blog will be your Gospel .OK, Let's start !
You may have such trouble , When you hang up the agent , But when cross domain access occurs , You will need to enter the user name and password for authentication . Which interrupts selenium Automatic execution of . Like this :
Because this kind of spring frame does not belong to js bounced , Out of commission switch_to_alert() Method location processing ; Nor does it belong to the pop-up box constructed by page elements , So you can't use element positioning .
in addition , I've tried other methods , such as : Analog key input , utilize Tab Create switch input and press enter .
This approach ended in failure , The reason is that the pop-up box can't get the focus of the box at all , It seems that there is a focus .
Another example , Simulate mouse click to text box to input . It also ended in failure . Cause unknown .
I can only change my mind here , Can this authentication pop-up box not pop up , Instead of popping it up and dealing with it . I checked it on the Internet , if really . Yes, there are. , The agent plug-in starts .
The principle is simple , With manifest.json and background.js Construct a plug-in , Configure user name and password on , After that, as long as there is a need for authentication, it will help us deal with it automatically and will not pop-up . Of course, this is my own summary , Maybe not so official , But that's all .
Here is how to generate proxy plug-ins :
# pack Google Proxy plug-in
def create_proxyauth_extension(proxy_host, proxy_port, proxy_username, proxy_password, scheme='http', plugin_path=None):
""" Check whether the proxy is set Args: proxy_host: Agency address proxy_port: Port number proxy_username: user name proxy_password: password Returns: plugin_path: The plugin path """
if plugin_path is None:
# Plug in generation path
plugin_path = 'plugins/vimm_chrome_proxyauth_plugin.zip'
manifest_json = """ { "version": "1.0.0", "manifest_version": 2, "name": "Chrome Proxy", "permissions": [ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background": { "scripts": ["background.js"] }, "minimum_chrome_version":"22.0.0" } """
background_js = string.Template(
""" var config = { mode: "fixed_servers", rules: { singleProxy: { scheme: "${scheme}", host: "${host}", port: parseInt(${port}) }, bypassList: ["foobar.com"] } }; chrome.proxy.settings.set({value: config, scope: "regular"}, function() {}); function callbackFn(details) { return { authCredentials: { username: "${username}", password: "${password}" } }; } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: ["<all_urls>"]}, ['blocking'] ); """
).substitute(
host=proxy_host,
port=proxy_port,
username=proxy_username,
password=proxy_password,
scheme=scheme,
)
# Packaging plug-in
with zipfile.ZipFile(plugin_path, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
return plugin_path
The code is still very simple , If you look carefully, you can understand . The next step is to start with the plug-in .
def get_chrome_by_proxy():
""" Get Google browser objects ( Agent start up ) :return: Google browser objects """
service.command_line_args()
# Start the browser driver
service.start()
# Create a Google browser plug-in
plugin_path = create_proxyauth_extension(
proxy_host=CONFIG["proxy_host"],
proxy_port=int(CONFIG["proxy_port"]),
proxy_username=CONFIG["proxy_user"],
proxy_password=CONFIG["proxy_pwd"]
)
options = Options()
options.add_argument("--start-maximized")
# options.add_argument('--incognito') # Configure the browser theme to black
options.add_argument('--ignore-certificate-errors') # Ignore connection warnings
options.add_experimental_option("detach", True) # Do not automatically close the browser
options.add_argument("--disable-gpu") # Ban gpu
options.add_experimental_option("prefs", {
"profile.managed_default_content_settings.images": 2}) # Disable loading pictures
options.add_experimental_option('excludeSwitches', ['enable-automation']) # Disable the prompt that the browser is being controlled by an automated program
# options.add_experimental_option('useAutomationExtension', False) # Disable extensions
options.add_extension(plugin_path)
return webdriver.Chrome(options=options)
This is a special pit , I tried N It was discovered only times , But maybe only people like me will get into the pit .
About the configuration that the browser theme is configured as black :
options.add_argument(’–incognito’)
Personal advice is not to use , The plug-in will be invalid if it is used , I don't know why , But this is the result of my stepping on the pit . Maybe this configuration has some special meaning , It's not just about changing the theme of the browser .
I just want to look better for myself , Bring yourself to the edge of giving up . I never dreamed that the problem would appear here . So let me give you a suggestion , Don't play with those fancy things .~~
This is the demand , Some jobs of the company need to be operated on the website frequently . By analyzing the browser background, you can find , In fact, there are more detailed data , And by modifying the parameters of the background request , You can get more results . So you can go through requests The module constructs a request to get the data we want .
But the analysis found that in the structure headers You must carry it with you token To get the return value , Manually copying constructs is complete OK Of . We don't know ,token For a long , And it's hard to analyze token The rules of construction . So consider whether you can get the request data from the browser background , To get what we need token?
This is also a sad journey , The way Baidu gives most is browsermob-proxy plug-in unit .
I have tried this way , failed . Its principle is to package the background requests into a har file , Then we can filter out what we want . But maybe something is wrong with me , No matter how I try , Can't get the results I want . And if in the company , The browser also needs to handle the authentication box , Obviously not . after many setbacks , give up .
Then I suddenly noticed that there was such a way ,ajax-hook.js. It seems to be through a js Scripts can directly grab json data . Isn't that beautiful ? But , Even if the blogger https://zhuanlan.zhihu.com/p/158394821 It's very detailed , I still haven't succeeded . perform js It's always prompting js Grammar mistakes , Tried many different versions . Finally, I choose to give up !
The last method is the browser's own logging function , have access to get_logs() Method to get the browser log , This log contains the detailed structure and return value of each request in great detail . We can get the data we want ! Here is the code :
Construction of browser objects , Some parameters can be appropriately deleted , Here, the upper structure is used . There are subtle differences , Please observe carefully .
def get_chrome_by_proxy():
""" Get Google browser objects ( Agent start up ) :return: Google browser objects """
service.command_line_args()
# Start the browser driver
service.start()
# Create a Google browser plug-in
plugin_path = create_proxyauth_extension(
proxy_host=CONFIG["proxy_host"],
proxy_port=int(CONFIG["proxy_port"]),
proxy_username=CONFIG["proxy_user"],
proxy_password=CONFIG["proxy_pwd"]
)
options = Options()
options.add_argument("--start-maximized")
# options.add_argument('--incognito') # Configure the browser theme to black
options.add_argument('--ignore-certificate-errors') # Ignore connection warnings
options.add_experimental_option("detach", True) # Do not automatically close the browser
options.add_argument("--disable-gpu") # Ban gpu
options.add_experimental_option("prefs", {
"profile.managed_default_content_settings.images": 2}) # Disable loading pictures
options.add_experimental_option('excludeSwitches', ['enable-automation']) # Disable the prompt that the browser is being controlled by an automated program
options.add_experimental_option('useAutomationExtension', False) # Disable extensions
options.add_experimental_option('w3c', False)
caps = {
'loggingPrefs': {
'performance': 'ALL',
}
}
options.add_extension(plugin_path)
# options.add_experimental_option('prefs', prefs)
return webdriver.Chrome(options=options, desired_capabilities=caps)
Here is another magical pit , There is a parameter that must have :
options.add_experimental_option(‘w3c’, False)
without , This will cause an error in the following code :
chrome = driver.get_chrome()
chrome.get_log('performance')
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: log type 'performance' not found
(Session info: chrome=84.0.4147.68)
I'm not a good person after all , Clearly defined performance, Just a hint that I didn't find this type . The key , A search of a large area is all written like this . The edge of several collapses , I suddenly found that I got the returned result . I went to , What the hell is this , After comparison, I found that it was the ghost options.add_experimental_option('w3c', False)
. Or that sentence , Although I don't understand , But this is the result of my stepping on the pit .
If you succeed , You will get a result like this :
[{
'level': 'INFO', 'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://www.baidu.com/","frameId":"3791AB5772F75D7BCFD86443B37239D9","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"B01BE6CD6AE25E589D833A7B8AA16036","request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.68 Safari/537.36","sec-ch-ua":"\\"\\\\\\\\Not\\\\\\"A;Brand\\";v=\\"99\\", \\"Chromium\\";v=\\"84\\", \\"Google Chrome\\";v=\\"84\\"","sec-ch-ua-mobile":"?0"},"initialPriority":"VeryHigh","method":"GET","mixedContentType":"none","referrerPolicy":"no-referrer-when-downgrade","url":"https://www.baidu.com/"},"requestId":"B01BE6CD6AE25E589D833A7B8AA16036","timestamp":613438.216976,"type":"Document","wallTime":1619428501.838738}},"webview":"3791AB5772F75D7BCFD86443B37239D9"}', 'timestamp': 1619428501840}]
This includes the request url,headers,params, If there is a return value, it will contain response And so on . And it's standard json Format , So we can easily filter out the data we want . The only drawback is the huge amount of data . But in the end, I can only do it in this way .
because selenium Every time I start, I build a new browser , So you need to log in to the website , Every time you start, you need to log in again . This is more complicated for me , Email verification code is required . In other words, you have to send an email to get the verification code every time you log in . good heavens , It's tolerable ?
carry cookie Skip login. I tried this a long time ago , But it hasn't worked . It feels like it's not the same .
This time I found a blog when I was about to give up , With a try mentality , But it succeeded . This story tells us , Don't be too nervous about our results . Sometimes relax a little , It is easier to achieve our goals .
In fact, to sum up ,cookie If it doesn't work, there are fewer structures . Let's open the browser developer tool and have a look :
Just my current picture cookie There is so much to be constructed . But one by one, the structure of the monkey year and the horse month , So this plug-in is used EditThisCookie. This plug-in can quickly generate cookies List and copy .
Click export to generate cookies List and copy to the clipboard .
The data looks like this :
[
{
"domain": ".docusign.com",
"expirationDate": 1682127180,
"hostOnly": false,
"httpOnly": false,
"name": "__utma",
"path": "/",
"sameSite": "unspecified",
"secure": false,
"session": false,
"storeId": "0",
"value": "29959850.2106214803.1618810739.1619055204.1619055204.1",
"id": 1
},
{
"domain": ".docusign.com",
"expirationDate": 1634823180,
"hostOnly": false,
"httpOnly": false,
"name": "__utmz",
"path": "/",
"sameSite": "unspecified",
"secure": false,
"session": false,
"storeId": "0",
"value": "29959850.1619055204.1.1.utmcsr=support.docusign.com|utmccn=(referral)|utmcmd=referral|utmcct=/en/guides/ndse-admin-guide-delegated-admin",
"id": 2
},
{
"domain": ".docusign.com",
"expirationDate": 1626761640,
"hostOnly": false,
"httpOnly": false,
"name": "_fbp",
"path": "/",
"sameSite": "unspecified",
"secure": false,
"session": false,
"storeId": "0",
"value": "fb.1.1618985362069.319608315",
"id": 3
},
...
]
Then add... In coordination with the cycle cookie That's all right. .
cookies = [
{
"domain": ".docusign.com",
"expirationDate": 1682127180,
"hostOnly": false,
"httpOnly": false,
"name": "__utma",
"path": "/",
"sameSite": "unspecified",
"secure": false,
"session": false,
"storeId": "0",
"value": "29959850.2106214803.1618810739.1619055204.1619055204.1",
"id": 1
},
...
]
chrome.get("https://appdemo.docusign.com/home")
for cookie in cookies:
chrome.add_cookie(cookie)
chrome.refresh() # Refresh
chrome.get("https://appdemo.docusign.com/home") # revisit
Pay attention here , After adding cookie You need to refresh the browser after , Then visit again .
The above is the record of this pit stepping , May you step on the pit less . About ajax-hook.js That way , If you know how to use it, please ask , My question is : While executing js Report grammatical errors .
Welcome to explore ! I will keep walking on this road .