from :https://zhuanlan.zhihu.com/p/442935082
Python It's very convenient to write , But in the face of a lot of for In the cycle , The execution speed is a little urgent . The reason lies in , python It is a dynamically typed language , Do data type checking only during runtime , It's very inefficient ( Especially on a large scale for In the cycle ).
Compared with , C/C++ The type of each variable is given in advance , Generate binary executable files through compilation . Compared with the python, C/C++ High efficiency , On a large scale for Loop execution is fast .
since python Our weakness lies in speed , therefore , To give python Speed up , Can it be in Python Call in C/C++ Code for ?
When we write Python Code , What we get is a Python Code to .py
Text file with extension . To run code , Need Python Interpreter to execute .py
file .
( You translate for me , What do you mean python Code )
When we are from Python Official website Download and install Python after , We've got an official version of the interpreter :CPython
. This interpreter uses C Language development , So called CPython
. Run on the command line python
That is to start. CPython
Interpreter .CPython
Is the most widely used Python Interpreter .
although CPython Low efficiency , But if you use it to call C/C++ Code , The effect is very good . image numpy And so on , A lot of them use C/C++ Written . In this way, we can make use of python Simple grammar , Can be used again C/C++ Efficient execution speed . In some cases numpy Efficiency is better than writing by yourself C/C++ Still high , because numpy Take advantage of CPU Instruction set optimization and multi-core parallel computing .
What we are going to talk about today Python call C/C++, It's all based on CPython Of the interpreter .
IronPython
and Jython
similar , It's just IronPython
It's running at Microsoft .Net
On the platform Python
Interpreter , You can directly Python Code compiled into .Net Bytecode . The disadvantage is , because numpy
And other commonly used libraries use C/C++
Compilation of , So in IronPython
Call in numpy
Third party libraries are very inconvenient . ( Now Microsoft has given up IronPython The update of )
Jython
Is running on the Java
On the platform Python
Interpreter , You can directly Python
Code compiled into Java
Bytecode execution .Jython
The advantage of is that it can call Java
Related libraries , Bad heel IronPython
equally .
PyPy One is based on Python Interpreter , Also is to use python explain .py. Its goal is speed of execution .PyPy use JIT technology , Yes Python Dynamic compilation of code ( Notice it's not an explanation ), So it can be significantly improved Python Code execution speed .
Suppose we have a simple python function
def add(x, y):
return x + y
then CPython
The execution is like this ( Pseudo code )
if instance_has_method(x, '__add__') {
// x.__add__ There are a lot of different types of y The judgment of the
return call(x, '__add__', y);
} else if isinstance_has_method(super_class(x), '__add__' {
return call(super_class, '__add__', y);
} else if isinstance(x, str) and isinstance(y, str) {
return concat_str(x, y);
} else if isinstance(x, float) and isinstance(y, float) {
return add_float(x, y);
} else if isinstance(x, int) and isinstance(y, int) {
return add_int(x, y);
} else ...
because Python Dynamic type of , A simple function , You have to make many type judgments . It's not over yet. , You think it's a function that adds two integers , Namely C In language x + y Well ? No.
Python Everything in it is the object , actually Python Inside int It is probably such a structure ( Pseudo code ).
struct {
prev_gc_obj *obj
next_gc_obj *obj
type IntType
value IntValue
... other fields
}
Every int They are all such structures , Or dynamically allocate it and put it in heap Upper , Inside value Not yet , That means you count 1000 This structure plus 1000 This structure , Need to be in heap in malloc come out 2000 This structure . After the calculation results are used up , There is also memory reclamation . ( Do so much , Speed is definitely not good )
therefore , If you can statically compile and execute + Specifies the type of the variable , Will greatly improve the execution speed .
cython Is a new programming language , Its syntax is based on python, But there are some C/C++ The grammar of . for instance , cython Variable types can be specified in , Or use some C++ Inside stl library ( For example, use std::vector
), Or call your own C/C++ function .
Be careful : Cython No CPython!
We have a RawPython.py
from math import sqrt
import time
def func(n):
res = 0
for i in range(1, n):
res = res + 1.0 / sqrt(i)
return res
def main():
start = time.time()
res = func(30000000)
print(f"res = {
res}, use time {
time.time() - start:.5}")
if __name__ == '__main__':
main()
So let's use Python See how much time it takes to execute in native mode , On my computer it takes 4 second .
First , Put one cython The program translates into .c/.cpp
file , And then use C/C++
compiler , Compile to generate binary file . stay Windows Next , We need to install Visual Studio/mingw Etc . stay Linux or Mac Next , We need to install gcc
, clang
Etc .
pip
install cythonpip install cython
hold RawPython.py Rename it to RawPython1.pyx
(1) use setup.py compile
Add one more setup.py
, Add the following . here language_level It means , Use Python 3.
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize('RawPython1.pyx', language_level=3)
)
hold Python Compile into binary code
python setup.py build_ext --inplace
then , We found that there are more in the current directory RawPython1.c( from .pyx
Transformation generation ), and RawPython1.pyd
( from .c Compile the generated binaries ).
(2) Compile directly from the command line ( With gcc
For example )
cython RawPython1.pyx
gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python3.x -o RawPython1.so RawPython1.c
The first sentence is to put .pyx Turn it into .c, The second sentence is to use gcc
compile + link .
python -c "import RawPython1; RawPython1.main()"
We can import compiled RawPython1 modular , And then in Python Call execution in .
From the execution results of the above steps , It didn't improve much , Only about twice as fast , This is because Python In addition to the explanation and execution, the most important reason for the slow running speed of is Python Is a dynamically typed language , Each variable does not know its type before running , So even if it is compiled into binary code, it will not be too fast , At this time, we need to use it deeply Cython
Here it is Python Speed up , Is the use of Cython
To specify the Python Data type of .
Specify the variable type
cython Are the benefits of , Can be like C The language is the same , Explicitly assign a type to a variable . therefore , We are cython
The function of , Add the type of the loop variable .
then , use C In language sqrt Implement the square root operation .
def func(int n):
cdef double res = 0
cdef int i, num = n
for i in range(1, num):
res = res + 1.0 / sqrt(i)
return res
however , python in math.sqrt
Method , The return value is one Python
Of float
object , The efficiency is still low .
in order to , Can we use C Linguistic sqrt function ? Certainly. ~
Cython
For some common C function /C++ Class is wrapped , Can be directly in Cython Call in .
Let's put the beginning
from math import sqrt
Switch to
from libc.math cimport sqrt
Then compile and run in the above way , I found that the speed has been improved a lot .
The complete code after transformation is as follows :
import time
from libc.math cimport sqrt
def func(int n):
cdef double res = 0
cdef int i, num = n
for i in range(1, num):
res = res + 1.0 / sqrt(i)
return res
def main():
start = time.time()
res = func(30000000)
print(f"res = {
res}, use time {
time.time() - start:.5}")
if __name__ == '__main__':
main()
Cython call C/C++
since C/C++ More efficient , Can we use cython call C/C++ Well ? Just use C Language to rewrite this function , And then in cython Call in .
First write a paragraph corresponding to C Language version
usefunc.h
#pragma once
#include <math.h>
double c_func(int n)
{
int i;
double result = 0.0;
for(i=1; i<n; i++)
result = result + sqrt(i);
return result;
}
then , We are Cython
in , Import this header file , And then call this function
cdef extern from "usecfunc.h":
cdef double c_func(int n)
import time
def func(int n):
return c_func(n)
def main():
start = time.time()
res = func(30000000)
print(f"res = {
res}, use time {
time.time() - start:.5}")
stay Cython Use in numpy
stay Cython
in , We can call numpy
. however , If you access directly by array subscript , We also need dynamic judgment numpy
Type of data , In this way, the efficiency is relatively low .
import numpy as np
cimport numpy as np
from libc.math cimport sqrt
import time
def func(int n):
cdef np.ndarray arr = np.empty(n, dtype=np.float64)
cdef int i, num = n
for i in range(1, num):
arr[i] = 1.0 / sqrt(i)
return arr
def main():
start = time.time()
res = func(30000000)
print(f"len(res) = {
len(res)}, use time {
time.time() - start:.5}")
explain :
cimport numpy as np
This sentence means , We can use numpy
Of C/C++ Interface ( Specify the data type , Array dimension, etc ).
This sentence means , We can also use numpy
Of Python Interface (np.array, np.linspace etc. ). Cython
Deal with this ambiguity internally , This way, users don't need to use different names .
At compile time , We still need to revise it setup.py, introduce numpy
The header file .
from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy as np
setup(ext_modules = cythonize(
Extension("RawPython4", ["RawPython4.pyx"],include_dirs=[np.get_include()],),
language_level=3)
)
Speed up ! Speed up !
The above code , It can be further accelerated
numpy
Data types and dimensions of arrays , This eliminates the need to dynamically determine data types . The actual generated code , Is in accordance with the C Language to access by array subscript .i
individual . To do negative subscript access , Also need an extra if…else… To judge . If we don't use this feature , You can also turn it off .The final acceleration procedure is as follows :
import numpy as np
cimport numpy as np
from libc.math cimport sqrt
import time
cimport cython
@cython.boundscheck(False) # Turn off array subscript out of bounds
@cython.wraparound(False) # Turn off negative indexing
@cython.cdivision(True) # Turn off except 0 Check
@cython.initializedcheck(False) # Close to check whether the memory view is initialized
def func(int n):
cdef np.ndarray[np.float64_t, ndim=1] arr = np.empty(n, dtype=np.float64)
cdef int i, num = n
for i in range(1, num):
arr[i] = 1.0 / sqrt(i)
return arr
def main():
start = time.time()
res = func(30000000)
print(f"len(res) = {
len(res)}, use time {
time.time() - start:.5}")
cdef np.ndarray[np.float64_t, ndim=1] arr = np.empty(n, dtype=np.float64)
This sentence means , We created numpy Array time , Manually specify variable types and array dimensions .
The above is to turn off array subscript out of bounds for this function , Negative index , except 0 Check , Whether the memory view is initialized . We can also set it globally , That is to say .pyx The head of the document , Add notes
# cython: boundscheck=False
# cython: wraparound=False
# cython: cdivision=True
# cython: initializedcheck=False
It can also be written in this way :
with cython.cdivision(True):
# do something here
other
cython Absorbed a lot of C/C++ The grammar of , It also includes pointers and references . You can also put one struct/class from C++ Pass to Cython.
Cython Grammar and Python similar , At the same time, some C/C++ Characteristics of , For example, specify the variable type . meanwhile , Cython You can also call C/C++ Function of .
Cython It is characterized by , If the variable type is not specified , Execution efficiency is similar to Python almost . After specifying the type , The execution efficiency will be relatively high .
For more documents, please refer to Cython Official documents
Welcome to Cython’s Documentationdocs.cython.org/en/latest/index.html
Cython
Is a kind Python
Language , however pybind11
Is based on C++
Of . We are .cpp Introduce in the file pybind11, Definition python Program entrance , Then compile and execute .
From the description on the official website pybind11 Several characteristics of
It can be executed pip install pybind11
install pybind11 ( Omnipotent pip)
It can also be used. Visual Studio + vcpkg+CMake To install .
#include <pybind11/pybind11.h>
namespace py = pybind11;
int add_func(int i, int j) {
return i + j;
}
PYBIND11_MODULE(example, m) {
m.doc() = "pybind11 example plugin"; // Optional , Explain what this module does
m.def("add_func", &add_func, "A function which adds two numbers");
}
First introduce pybind11 The header file , And then use PYBIND11_MODULE Statement .
import example
m.def( " to python Call method name ", & The actual function , " Function function description " ). // The function description is optional
pybind11 Only header files , So just add the corresponding header file to the code , You can use pybind11 了 .
#include <pybind11/pybind11.h>
stay Linux Next , You can execute such a command to compile :
c++ -O3 -Wall -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) example.cpp -o example$(python3-config --extension-suffix)
We can also use setup.py To compile the ( stay Windows Next , need Visual Studio or mingw Etc ; stay Linux or Mac Next , need gcc or clang Etc )
from setuptools import setup, Extension
import pybind11
functions_module = Extension(
name='example',
sources=['example.cpp'],
include_dirs=[pybind11.get_include()],
)
setup(ext_modules=[functions_module])
Then run the following command , You can compile
python setup.py build_ext --inplace
stay python Call in the
python -c "import example; print(example.add_func(200, 33))"
Through simple code modification , You can inform Python Parameter name
m.def("add", &add, "A function which adds two numbers", py::arg("i"), py::arg("j"));
You can also specify default parameters
int add(int i = 1, int j = 2) {
return i + j;
}
stay PYBIND11_MODULE
Specify default parameters in
m.def("add", &add, "A function which adds two numbers",py::arg("i") = 1, py::arg("j") = 2);
PYBIND11_MODULE(example, m) {
m.attr("the_answer") = 23333;
py::object world = py::cast("World");
m.attr("what") = world;
}
For strings , Need to use py::cast
Convert it to Python object .
And then in Python in , You can visit the_answer
and what
object
import example
>>>example.the_answer
42
>>>example.what
'World'
because python All things are objects , So we can use py::object
To preserve Python The variables in the / Method / Module etc. .
py::object os = py::module_::import("os");
py::object makedirs = os.attr("makedirs");
makedirs("/tmp/path/to/somewhere");
This is equivalent to Python Li implemented.
import os
makedirs = os.makedirs
makedirs("/tmp/path/to/somewhere")
We can pass it in directly python Of list
void print_list(py::list my_list) {
for (auto item : my_list)
py::print(item);
}
PYBIND11_MODULE(example, m) {
m.def("print_list", &print_list, "function to print list", py::arg("my_list"));
}
stay Python Run this program in ,
>>>import example
>>>result = example.print_list([2, 23, 233])
2
23
233
>>>print(result)
This function can also be used std::vector<int>
As a parameter . Why can we do this ? pybind11 Can automatically put python list object , The copy construct is std::vector<int>
. On the way back , And automatically put std::vector
Turn into Python Medium list. The code is as follows :
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
std::vector<int> print_list2(std::vector<int> & my_list) {
auto x = std::vector<int>();
for (auto item : my_list){
x.push_back(item + 233);
}
return x;
}
PYBIND11_MODULE(example, m) {
m.def("print_list2", &print_list2, "help message", py::arg("my_list"));
}
because numpy Work well , So if you can put numpy The array is passed as a parameter to pybind11, That would be very fragrant . The code is as follows ( For a long time )
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
py::buffer_info buf1 = input1.request(), buf2 = input2.request();
if (buf1.ndim != 1 || buf2.ndim != 1)
throw std::runtime_error("Number of dimensions must be one");
if (buf1.size != buf2.size)
throw std::runtime_error("Input shapes must match");
/* No pointer is passed, so NumPy will allocate the buffer */
auto result = py::array_t<double>(buf1.size);
py::buffer_info buf3 = result.request();
double *ptr1 = (double *) buf1.ptr,
*ptr2 = (double *) buf2.ptr,
*ptr3 = (double *) buf3.ptr;
for (size_t idx = 0; idx < buf1.shape[0]; idx++)
ptr3[idx] = ptr1[idx] + ptr2[idx];
return result;
}
m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
The first numpy Take out the pointer of , Then operate on the pointer .
We are Python The tests are as follows :
>>>import example
>>>import numpy as np
>>>x = np.ones(3)
>>>y = np.ones(3)
>>>z = example.add_arrays(x, y)
>>>print(type(z))
<class 'numpy.ndarray'>
>>>print(z)
array([2., 2., 2.])
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/numpy.h>
namespace py = pybind11;
int add_func(int i, int j) {
return i + j;
}
void print_list(py::list my_list) {
for (auto item : my_list)
py::print(item);
}
std::vector<int> print_list2(std::vector<int> & my_list) {
auto x = std::vector<int>();
for (auto item : my_list){
x.push_back(item + 233);
}
return x;
}
py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
py::buffer_info buf1 = input1.request(), buf2 = input2.request();
if (buf1.ndim != 1 || buf2.ndim != 1)
throw std::runtime_error("Number of dimensions must be one");
if (buf1.size != buf2.size)
throw std::runtime_error("Input shapes must match");
/* No pointer is passed, so NumPy will allocate the buffer */
auto result = py::array_t<double>(buf1.size);
py::buffer_info buf3 = result.request();
double *ptr1 = (double *) buf1.ptr,
*ptr2 = (double *) buf2.ptr,
*ptr3 = (double *) buf3.ptr;
for (size_t idx = 0; idx < buf1.shape[0]; idx++)
ptr3[idx] = ptr1[idx] + ptr2[idx];
return result;
}
PYBIND11_MODULE(example, m) {
m.doc() = "pybind11 example plugin"; // Optional , Explain what this module does
m.def("add_func", &add_func, "A function which adds two numbers");
m.attr("the_answer") = 23333;
py::object world = py::cast("World");
m.attr("what") = world;
m.def("print_list", &print_list, "function to print list", py::arg("my_list"));
m.def("print_list2", &print_list2, "help message", py::arg("my_list2"));
m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
}
pybind11 stay C++ Next use , It can be for Python The program offers C++ Interface . meanwhile , pybind11 Also supports incoming python list, numpy Objects such as .
For more documents, please refer to pybind11 Official documents
https://pybind11.readthedocs.io/en/stable/pybind11.readthedocs.io/en/stable/
use Python The development is relatively simple , use C++ Development is a bit of a hassle .
Writing python when , We can go through Profile And other time-consuming analysis tools , Find out the code block when comparing , For this piece C++ To optimize . There is no need to optimize all the parts .
Cython or pybind11 Do only three things : Speed up , Speed up , Or speed up . It takes a lot of calculation , Time consuming places , We can use C/C++ To achieve , This will help to improve the whole Python The execution speed of the program .
Speed up python There are other ways , For example, use numpy Instead of the vectorization operation for loop , Use jit Instant compilation, etc .