author :Dennis Brinkrolf
translator : Cat under peas @Python cat
The original title is :10 Unknown Security Pitfalls for Python
english :https://blog.sonarsource.com/10-unknown-security-pitfalls-for-python
Statement : This translation is for the purpose of communication and learning , be based on CC BY-NC-SA 4.0 License agreement . For easy reading , There is a slight change in the content .
Python When developers use standard libraries and common frameworks , All think their programs have reliable security . However , stay Python in , Just like in any other programming language , Some features may be misunderstood or misused by developers . generally , There are very few nuances or details that make developers careless , So as to introduce serious security vulnerabilities into the code .
In this post , We will share in practice Python In the project 10 A security trap . We chose some traps that are not well known in the technology circle . By introducing each problem and its impact , We want to improve people's perception of these problems , And raise everyone's safety awareness . If you're using these features , Please be sure to check your Python Code !
Python Support code execution in an optimized way . This makes the code run faster , Use less memory . When programs are used on a large scale , Or when there are few resources available , This method is particularly effective . Some prepackaged Python The program provides optimized bytecode .
However , When the code is optimized , be-all assert Statements are ignored . Developers sometimes use them to determine certain conditions in the code . for example , If you use assertions for authentication checks , May result in a safe bypass .
def superuser_action(request, user):
assert user.is_super_user
# execute action as super user
In this case , The first 2 In a row assert The statement will be ignored , As a result, non super users can also run to the next line of code . It is not recommended to use assert Statement for security related checks , But we do see them in real projects .
os.makdirs
Function can create one or more folders in the operating system . Its second parameter mode Used to specify the default permissions for folders created . In the following code, No 2 In line , Folder A/B/C Yes, it is rwx------ (0o700) Permission created . This means that only the current user ( owner ) People who own these folders 、 Write and execute permissions .
def init_directories(request):
os.makedirs("A/B/C", mode=0o700)
return HttpResponse("Done!")
stay Python < 3.6 In the version , Created folder A、B and C Your permissions are 700. however , stay Python > 3.6 In the version , Only the last folder C The authority of is 700, Other folders A and B The permission of is the default 755.
therefore , stay Python > 3.6 in ,os.makdirs
Function is equivalent to Linux This order of :mkdir -m 700 -p A/B/C
.
Some developers don't realize the differences between versions , This is already happening Django Creates a privilege override vulnerability in (cve - 2022 -24583), Similarly, , This is in WordPress It also creates a Reinforced bypass problem .
os.path.join(path, *paths)
The function is used to connect multiple file paths into a combined path . The first parameter usually contains the underlying path , After that, each parameter is spliced into the basic path as a component .
However , This function has a little-known feature . If one of the spliced paths is marked with / start , Then all prefix paths, including the base path, will be deleted , This path will be treated as an absolute path . The following example reveals this trap that developers may encounter .
def read_file(request):
filename = request.POST['filename']
file_path = os.path.join("var", "lib", filename)
if file_path.find(".") != -1:
return HttpResponse("Failed!")
with open(file_path) as f:
return HttpResponse(f.read(), content_type='text/plain')
In the 3 In line , We use os.path.join Function constructs the target path from the file name entered by the user . In the 4 In line , Check whether the generated path contains ”.“, Prevent path traversal vulnerabilities .
however , If the file name parameter passed in by the attacker is ”/a/b/c.txt“, So the first 3 The variable obtained from the row file_path It will be an absolute path (/a/b/c.txt). namely os.path.join Will ignore ”var/lib“ part , An attacker may not use “.” Characters are read into any file . Even though os.path.join This behavior is described in the documentation of , But this still leads to many loopholes (Cuckoo Sandbox Evasion, CVE-2020-35736).
tempfile.NamedTemporaryFile
The function is used to create a temporary file with a specific name . however ,prefix( Prefix ) and suffix( suffix ) Parameters are vulnerable to path traversal attacks (Issue 35278). If an attacker controls one of these parameters , He can create a temporary file anywhere in the file system . The following example reveals a pitfall that developers may encounter .
def touch_tmp_file(request):
id = request.GET['id']
tmp_file = tempfile.NamedTemporaryFile(prefix=id)
return HttpResponse(f"tmp file: {
tmp_file} created!", content_type='text/plain')
In the 3 In line , User entered id Used as a prefix for temporary files . If the attacker passes in id Parameter is “/…/var/www/test”, Such a temporary file will be created :/var/www/test_zdllj17. Rough look , This may be harmless , But it will create a basis for attackers to exploit more complex vulnerabilities .
stay Web Application , You usually need to unzip the uploaded compressed file . stay Python in , A lot of people know that TarFile.extractall And TarFile.extract Functions are vulnerable to Zip Slip attack . The attacker tampered with the file name in the compressed package , Make it include path traversal (…/) character , To launch an attack. .
This is why compressed files should always be considered an untrusted source .zipfile.extractall And zipfile.extract Function can be used for zip Clean the contents , So as to prevent such path traversal vulnerabilities .
however , It doesn't mean in ZipFile There will be no path traversal vulnerability in the library . The following is a section of code to extract the file .
def extract_html(request):
filename = request.FILES['filename']
zf = zipfile.ZipFile(filename.temporary_file_path(), "r")
for entry in zf.namelist():
if entry.endswith(".html"):
file_content = zf.read(entry)
with open(entry, "wb") as fp:
fp.write(file_content)
zf.close()
return HttpResponse("HTML files extracted!")
The first 3 The line code is based on the temporary path of the file uploaded by the user , Create a ZipFile processor . The first 4 - 8 This line of code will all start with “.html” Extract the compressed items at the end . The first 4 In a row zf.namelist The function will get zip The name of the inner compressed item . Be careful , Only zipfile.extract And zipfile.extractall The function cleans the compressed items , No other function will .
under these circumstances , An attacker can create a file name , for example “…/…/…/var/www/html”, Fill in the content at will . The contents of the malicious file will appear on page 6 Row read , And in the first place 7-8 The line is written to the path controlled by the attacker . therefore , An attacker can create arbitrary on the entire server HTML file .
As mentioned above , Files in a compressed package should be considered untrusted . If you don't use zipfile.extractall perhaps zipfile.extract, You have to be right zip The name of the document “ disinfect ”, For example, using os.path.basename. otherwise , It could lead to serious security vulnerabilities , As in the NLTK Downloader (CVE-2019-14751) That's what I found in the book .
Regular expressions (regex) It's most Web An integral part of the process . We often see it customized Web Application firewall (WAF,Web Application Firewalls) Used for input validation , For example, detect malicious strings . stay Python in ,re.match and re.search There are subtle differences between , We will demonstrate in the following code snippet .
def is_sql_injection(request):
pattern = re.compile(r".*(union)|(select).*")
name_to_test = request.GET['name']
if re.search(pattern, name_to_test):
return True
return False
In the 2 In line , We define a match union perhaps select The pattern of , To detect possible SQL Inject . This is a bad way of writing , Because you can easily bypass these blacklists , But we've seen it in programs online . In the 4 In line , function re.match Use the previously defined pattern , Check the 3 Whether the user input in the line contains these malicious values .
However , And re.search The function is different from ,re.match Function does not match new line . for example , If the attacker submits a value aaaaaa \n union select, This input doesn't match the regular expression . therefore , The check can be bypassed , Loss of protection .
To make a long story short , We do not recommend using regular expression blacklists for any security checks .
Unicode Support the representation of characters in various forms , And map these characters to code points . stay Unicode In the standard , Different Unicode There are four normalization schemes for characters . The program can use these normalization methods , Store data in a standard way independent of human language , Like username .
However , Attackers can take advantage of these data , This has led to Python Of urllib There are loopholes (CVE-2019-9636). The following code snippet demonstrates an example based on NFKC Normalized cross site scripting vulnerability (XSS,Cross-Site Scripting).
import unicodedata
from django.shortcuts import render
from django.utils.html import escape
def render_input(request):
user_input = escape(request.GET['p'])
normalized_user_input = unicodedata.normalize("NFKC", user_input)
context = {
'my_input': normalized_user_input}
return render(request, 'test.html', context)
In the 6 In line , The user's input is Django Of escape Function handles , To prevent XSS Loophole . In the 7 In line , The cleaned input is NFKC Algorithm normalization , In order to 8-9 Go through the line test.html The template renders correctly .
templates/test.html
<!DOCTYPE html>
<html lang="en">
<body>
{
{ my_input | safe}}
</body>
</html>
In template test.html in , The first 4 The variables of the line my_input Marked as safe , Because developers expect special characters , And that the variable has been escape Function cleaning . By marking keywords safe, Django Variables will not be cleaned again .
however , Due to the first 7 That's ok (view.py) Normalization of , character “%EF%B9%A4” Will be converted to “<”,“%EF%B9%A5” Converted to “>”. This allows attackers to inject arbitrary HTML Mark , And then trigger XSS Loophole . To prevent this vulnerability , After normalizing the user input , Then wash .
remember ,Unicode Characters are mapped to code points . However , There are many different human languages ,Unicode Trying to unify them . This means that different characters are likely to have the same “layout”. for example , Small Turkish ı( No point ) The characters are capitalized in English I. In the Latin alphabet , character i Also in capital letters I Express . stay Unicode In the standard , These two different characters are mapped to the same code point in uppercase .
This behavior can be exploited , In fact, it's already Django Has led to a serious vulnerability (CVE-2019-19844). The following code is an example of resetting a password .
from django.core.mail import send_mail
from django.http import HttpResponse
from vuln.models import User
def reset_pw(request):
email = request.GET['email']
result = User.objects.filter(email__exact=email.upper()).first()
if not result:
return HttpResponse("User not found!")
send_mail('Reset Password','Your new pw: 123456.', '[email protected]', [email], fail_silently=False)
return HttpResponse("Password reset email send!")
The first 6 Line of code gets the user entered email, The first 7-9 Line of code to check this email value , Find out if there is a with this email Users of . If the user exists , Is the first 10 Line code is based on 6 The... Entered in the line email Address , Send mail to users . It's important to point out that , The first 7-9 The checking of email addresses in the line is case insensitive , Used upper function .
As for the attack , Let's assume that there is an email address in the database [email protected] Users of . that , An attacker can simply pass in [email protected]ıx.com As a first 6 In a row email, among i Replaced with Turkish ı. The first 7 Line of code converts the mailbox to uppercase , The result is [email protected]. This means finding a user , So an email will be sent to reset the password .
However , The mail was sent to the 6 Line unconverted email address , That is, it contains Turkish ı. let me put it another way , The passwords of other users were sent to the email address controlled by the attacker . To prevent this vulnerability , You can put the second 10 Replace the row with the user mailbox in the database . Even if there is a coding conflict , The attacker will not get any benefit in this case .
stay Python < 3.8 in ,IP The address will be ipaddress Library normalization , Therefore, the zero of the prefix will be deleted . At first glance, such behavior may be harmless , But it's already in Django Has resulted in a high severity vulnerability (CVE-2021-33571). An attacker can use normalization to bypass the verifier , Launch server request forgery attack (SSRF,Server-Side Request Forgery).
The following code shows how to bypass such a verifier .
import requests
import ipaddress
def send_request(request):
ip = request.GET['ip']
try:
if ip in ["127.0.0.1", "0.0.0.0"]:
return HttpResponse("Not allowed!")
ip = str(ipaddress.IPv4Address(ip))
except ipaddress.AddressValueError:
return HttpResponse("Error at validation!")
requests.get('https://' + ip)
return HttpResponse("Request send!")
The first 5 Line of code to get a... Passed in by the user IP Address , The first 7 The line of code uses a blacklist to check the IP Whether it is a local address , To prevent possible SSRF Loophole . This blacklist is not complete , Just as an example .
The first 9 Check this line of code IP Is it IPv4 Address , At the same time IP normalization . After verification , The first 12 This line of code will be applied to the IP Initiate the actual request .
however , An attacker can pass in 127.0.001 In this way IP Address , In the 7 Cannot find... In the blacklist of rows . then , The first 9 Line code usage ipaddress.IPv4Address take IP Owned by one becomes 127.0.0.1. therefore , An attacker can bypass SSRF validators , And send a request to the local network address .
stay Python < 3.7 in ,urllib.parse.parse_qsl Function allows the use of “;” and “&” Character as URL The delimiter of the query variable . Interestingly “;” Characters cannot be recognized as separators by other languages .
In the following example , We will show why this behavior can lead to loopholes . Suppose we are running an infrastructure , The front end is a PHP Program , The back end is a Python Program .
The attacker turned to PHP The front end sends the following GET request :
GET https://victim.com/?a=1;b=2
PHP The front end only recognizes one query parameter “a”, Its content is “1;b=2”.PHP Not put “;” Character as the separator of query parameters . Now? , The front end will forward the attacker's request directly to the internal Python Program :
GET https://internal.backend/?a=1;b=2
If used urllib.parse.parse_qsl,Python The program will process it into two query parameters , namely “a=1” and “b=2”. This difference in query parameter resolution can lead to fatal security vulnerabilities , such as Django Medium Web Cache poisoning vulnerability (CVE-2021-23336).
In this post , We introduced 10 individual Python Security traps , We don't think developers know much about them . Every subtle trap is easily overlooked , And in the past, it has led to security vulnerabilities in online programs .
As mentioned earlier , Security traps can occur in various operations , From processing files 、 Catalog 、 Compressed files 、URL、IP To a simple string . A common situation is the use of library functions , These functions may have unexpected behavior . This reminds us that we must upgrade to the latest version , And read the document carefully . stay SonarSource in , We are studying these defects , In order to continuously improve our code analyzer in the future .