author :Dennis Brinkrolf
translator : Cat under peas @Python cat
The original title is :10 Unknown Security Pitfalls for Python
english :https://blog.sonarsource.com/10-unknown-security-pitfalls-for-python
Python When developers use standard libraries and common frameworks , All think their programs have reliable security . However , stay Python in , Just like in any other programming language , Some features may be misunderstood or misused by developers . generally , There are very few nuances or details that make developers careless , So as to introduce serious security vulnerabilities into the code .
In this post , We will share in practice Python In the project 10 A security trap . We chose some traps that are not well known in the technology circle . By introducing each problem and its impact , We want to improve people's perception of these problems , And raise everyone's safety awareness . If you're using these features , Please be sure to check your Python Code !
Python Support code execution in an optimized way . This makes the code run faster , Use less memory . When programs are used on a large scale , Or when there are few resources available , This method is particularly effective . Some prepackaged Python The program provides optimized bytecode .
However , When the code is optimized , be-all assert Statements are ignored . Developers sometimes use them to determine certain conditions in the code . for example , If you use assertions for authentication checks , May result in a safe bypass .
def superuser_action(request, user):
assert user.is_super_user
# execute action as super user
In this case , The first 2 In a row assert The statement will be ignored , As a result, non super users can also run to the next line of code . It is not recommended to use assert Statement for security related checks , But we do see them in real projects .
os.makdirs
Function can create one or more folders in the operating system . Its second parameter mode Used to specify the default permissions for folders created . In the following code, No 2 In line , Folder A/B/C Yes, it is rwx------ (0o700) Permission created . This means that only the current user ( owner ) People who own these folders 、 Write and execute permissions .
def init_directories(request):
os.makedirs("A/B/C", mode=0o700)
return HttpResponse("Done!")
stay Python < 3.6 In the version , Created folder A、B and C Your permissions are 700. however , stay Python > 3.6 In the version , Only the last folder C The authority of is 700, Other folders A and B The permission of is the default 755.
therefore , stay Python > 3.6 in ,os.makdirs
Function is equivalent to Linux This order of :mkdir -m 700 -p A/B/C
. Some developers don't realize the differences between versions , This is already happening Django Creates a privilege override vulnerability in (cve - 2022 -24583), Similarly, , This is in WordPress It also creates a reinforcement bypass problem .
os.path.join(path, *paths)
The function is used to connect multiple file paths into a combined path . The first parameter usually contains the underlying path , After that, each parameter is spliced into the basic path as a component .
However , This function has a little-known feature . If one of the spliced paths is marked with / start , Then all prefix paths, including the base path, will be deleted , This path will be treated as an absolute path . The following example reveals this trap that developers may encounter .
def read_file(request):
filename = request.POST['filename']
file_path = os.path.join("var", "lib", filename)
if file_path.find(".") != -1:
return HttpResponse("Failed!")
with open(file_path) as f:
return HttpResponse(f.read(), content_type='text/plain')
In the 3 In line , We use os.path.join Function constructs the target path from the file name entered by the user . In the 4 In line , Check whether the generated path contains ”.“, Prevent path traversal vulnerabilities .
however , If the file name parameter passed in by the attacker is ”/a/b/c.txt“, So the first 3 The variable obtained from the row file_path It will be an absolute path (/a/b/c.txt). namely os.path.join Will ignore ”var/lib“ part , An attacker may not use “.” Characters are read into any file . Even though os.path.join This behavior is described in the documentation of , But this still leads to many loopholes (Cuckoo Sandbox Evasion, CVE-2020-35736).
tempfile.NamedTemporaryFile
The function is used to create a temporary file with a specific name . however ,prefix( Prefix ) and suffix( suffix ) Parameters are vulnerable to path traversal attacks (Issue 35278). If an attacker controls one of these parameters , He can create a temporary file anywhere in the file system . The following example reveals a pitfall that developers may encounter .
def touch_tmp_file(request):
id = request.GET['id']
tmp_file = tempfile.NamedTemporaryFile(prefix=id)
return HttpResponse(f"tmp file: {tmp_file} created!", content_type='text/plain')
In the 3 In line , User entered id Used as a prefix for temporary files . If the attacker passes in id Parameter is “/../var/www/test”, Such a temporary file will be created :/var/www/test_zdllj17. Rough look , This may be harmless , But it will create a basis for attackers to exploit more complex vulnerabilities .
stay Web Application , You usually need to unzip the uploaded compressed file . stay Python in , A lot of people know that TarFile.extractall And TarFile.extract Functions are vulnerable to Zip Slip attack . The attacker tampered with the file name in the compressed package , Make it include path traversal (../) character , To launch an attack. .
This is why compressed files should always be considered an untrusted source .zipfile.extractall And zipfile.extract Function can be used for zip Clean the contents , So as to prevent such path traversal vulnerabilities .
however , It doesn't mean in ZipFile There will be no path traversal vulnerability in the library . The following is a section of code to extract the file .
def extract_html(request):
filename = request.FILES['filename']
zf = zipfile.ZipFile(filename.temporary_file_path(), "r")
for entry in zf.namelist():
if entry.endswith(".html"):
file_content = zf.read(entry)
with open(entry, "wb") as fp:
fp.write(file_content)
zf.close()
return HttpResponse("HTML files extracted!")
The first 3 The line code is based on the temporary path of the file uploaded by the user , Create a ZipFile processor . The first 4 - 8 This line of code will all start with “.html” Extract the compressed items at the end . The first 4 In a row zf.namelist The function will get zip The name of the inner compressed item . Be careful , Only zipfile.extract And zipfile.extractall The function cleans the compressed items , No other function will .
under these circumstances , An attacker can create a file name , for example “../../../var/www/html”, Fill in the content at will . The contents of the malicious file will appear on page 6 Row read , And in the first place 7-8 The line is written to the path controlled by the attacker . therefore , An attacker can create arbitrary on the entire server HTML file .
As mentioned above , Files in a compressed package should be considered untrusted . If you don't use zipfile.extractall perhaps zipfile.extract, You have to be right zip The name of the document “ disinfect ”, For example, using os.path.basename. otherwise , It could lead to serious security vulnerabilities , As in the NLTK Downloader (CVE-2019-14751) That's what I found in the book .
Regular expressions (regex) It's most Web An integral part of the process . We often see it customized Web Application firewall (WAF,Web Application Firewalls) Used for input validation , For example, detect malicious strings . stay Python in ,re.match and re.search There are subtle differences between , We will demonstrate in the following code snippet .
def is_sql_injection(request):
pattern = re.compile(r".*(union)|(select).*")
name_to_test = request.GET['name']
if re.search(pattern, name_to_test):
return True
return False
In the 2 In line , We define a match union perhaps select The pattern of , To detect possible SQL Inject . This is a bad way of writing , Because you can easily bypass these blacklists , But we've seen it in programs online . In the 4 In line , function re.match Use the previously defined pattern , Check the 3 Whether the user input in the line contains these malicious values .
However , And re.search The function is different from ,re.match Function does not match new line . for example , If the attacker submits a value aaaaaa \n union select, This input doesn't match the regular expression . therefore , The check can be bypassed , Loss of protection .
To make a long story short , We do not recommend using regular expression blacklists for any security checks .
Unicode Support the representation of characters in various forms , And map these characters to code points . stay Unicode In the standard , Different Unicode There are four normalization schemes for characters . The program can use these normalization methods , Store data in a standard way independent of human language , Like username .
However , Attackers can take advantage of these data , This has led to Python Of urllib There are loopholes (CVE-2019-9636). The following code snippet demonstrates an example based on NFKC Normalized cross site scripting vulnerability (XSS,Cross-Site Scripting).
import unicodedata
from django.shortcuts import render
from django.utils.html import escape
def render_input(request):
user_input = escape(request.GET['p'])
normalized_user_input = unicodedata.normalize("NFKC", user_input)
context = {'my_input': normalized_user_input}
return render(request, 'test.html', context)
In the 6 In line , The user's input is Django Of escape Function handles , To prevent XSS Loophole . In the 7 In line , The cleaned input is NFKC Algorithm normalization , In order to 8-9 Go through the line test.html The template renders correctly .
templates/test.html
<!DOCTYPE html>
<html lang="en">
<body>
{
{ my_input | safe}}
</body>
</html>
In template test.html in , The first 4 The variables of the line my_input Marked as safe , Because developers expect special characters , And that the variable has been escape Function cleaning . By marking keywords safe, Django Variables will not be cleaned again .
however , Due to the first 7 That's ok (view.py) Normalization of , character “%EF%B9%A4” Will be converted to “<”,“%EF%B9%A5” Converted to “>”. This allows attackers to inject arbitrary HTML Mark , And then trigger XSS Loophole . To prevent this vulnerability , After normalizing the user input , Then wash .
remember ,Unicode Characters are mapped to code points . However , There are many different human languages ,Unicode Trying to unify them . This means that different characters are likely to have the same “layout”. for example , Small Turkish ı( No point ) The characters are capitalized in English I. In the Latin alphabet , character i Also in capital letters I Express . stay Unicode In the standard , These two different characters are mapped to the same code point in uppercase .
This behavior can be exploited , In fact, it's already Django Has led to a serious vulnerability (CVE-2019-19844). The following code is an example of resetting a password .
from django.core.mail import send_mail
from django.http import HttpResponse
from vuln.models import User
def reset_pw(request):
email = request.GET['email']
result = User.objects.filter(email__exact=email.upper()).first()
if not result:
return HttpResponse("User not found!")
send_mail('Reset Password','Your new pw: 123456.', '[email protected]', [email], fail_silently=False)
return HttpResponse("Password reset email send!")
The first 6 Line of code gets the user entered email, The first 7-9 Line of code to check this email value , Find out if there is a with this email Users of . If the user exists , Is the first 10 Line code is based on 6 The... Entered in the line email Address , Send mail to users . It's important to point out that , The first 7-9 The checking of email addresses in the line is case insensitive , Used upper function .
As for the attack , Let's assume that there is an email address in the database [email protected] Users of . that , An attacker can simply pass in [email protected]ıx.com As a first 6 In a row email, among i Replaced with Turkish ı. The first 7 Line of code converts the mailbox to uppercase , The result is [email protected] This means finding a user , So an email will be sent to reset the password .
However , The mail was sent to the 6 Line unconverted email address , That is, it contains Turkish ı. let me put it another way , The passwords of other users were sent to the email address controlled by the attacker . To prevent this vulnerability , You can put the second 10 Replace the row with the user mailbox in the database . Even if there is a coding conflict , The attacker will not get any benefit in this case .
stay Python < 3.8 in ,IP The address will be ipaddress Library normalization , Therefore, the zero of the prefix will be deleted . At first glance, such behavior may be harmless , But it's already in Django Has resulted in a high severity vulnerability (CVE-2021-33571). An attacker can use normalization to bypass the verifier , Launch server request forgery attack (SSRF,Server-Side Request Forgery).
The following code shows how to bypass such a verifier .
import requests
import ipaddress
def send_request(request):
ip = request.GET['ip']
try:
if ip in ["127.0.0.1", "0.0.0.0"]:
return HttpResponse("Not allowed!")
ip = str(ipaddress.IPv4Address(ip))
except ipaddress.AddressValueError:
return HttpResponse("Error at validation!")
requests.get('https://' + ip)
return HttpResponse("Request send!")
The first 5 Line of code to get a... Passed in by the user IP Address , The first 7 The line of code uses a blacklist to check the IP Whether it is a local address , To prevent possible SSRF Loophole . This blacklist is not complete , Just as an example .
The first 9 Check this line of code IP Is it IPv4 Address , At the same time IP normalization . After verification , The first 12 This line of code will be applied to the IP Initiate the actual request .
however , An attacker can pass in 127.0.001 In this way IP Address , In the 7 Cannot find... In the blacklist of rows . then , The first 9 Line code usage ipaddress.IPv4Address take IP Owned by one becomes 127.0.0.1. therefore , An attacker can bypass SSRF validators , And send a request to the local network address .
stay Python < 3.7 in ,urllib.parse.parse_qsl Function allows the use of “;” and “&” Character as URL The delimiter of the query variable . Interestingly “;” Characters cannot be recognized as separators by other languages .
In the following example , We will show why this behavior can lead to loopholes . Suppose we are running an infrastructure , The front end is a PHP Program , The back end is a Python Program .
The attacker turned to PHP The front end sends the following GET request :
GET https://victim.com/?a=1;b=2
PHP The front end only recognizes one query parameter “a”, Its content is “1;b=2”.PHP Not put “;” Character as the separator of query parameters . Now? , The front end will forward the attacker's request directly to the internal Python Program :
GET https://internal.backend/?a=1;b=2
If used urllib.parse.parse_qsl,Python The program will process it into two query parameters , namely “a=1” and “b=2”. This difference in query parameter resolution can lead to fatal security vulnerabilities , such as Django Medium Web Cache poisoning vulnerability (CVE-2021-23336).
In this post , We introduced 10 individual Python Security traps , We don't think developers know much about them . Every subtle trap is easily overlooked , And in the past, it has led to security vulnerabilities in online programs .
As mentioned earlier , Security traps can occur in various operations , From processing files 、 Catalog 、 Compressed files 、URL、IP To a simple string . A common situation is the use of library functions , These functions may have unexpected behavior . This reminds us that we must upgrade to the latest version , And read the document carefully . stay SonarSource in , We are studying these defects , In order to continuously improve our code analyzer in the future .
Go to
period
return
Gu
technology
How to use Python Hide the data in the image
information
Lose again to AI, Overtake quickly
technology
use Python Draw the cartoon image of Gu ailing
technology
Python Visualization is close to 90 Baidu search of days
Share
Point collection
A little bit of praise
Click to see