您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Calling c/c++:python and pybind11 in Python

編輯：Python

stay Python Call in C/C++：cython And pybind11

from ：https://zhuanlan.zhihu.com/p/442935082

Python It's very convenient to write , But in the face of a lot of for In the cycle , The execution speed is a little urgent . The reason lies in , python It is a dynamically typed language , Do data type checking only during runtime , It's very inefficient ( Especially on a large scale for In the cycle ).

Compared with , C/C++ The type of each variable is given in advance , Generate binary executable files through compilation . Compared with the python, C/C++ High efficiency , On a large scale for Loop execution is fast .

since python Our weakness lies in speed , therefore , To give python Speed up , Can it be in Python Call in C/C++ Code for ?

Python Interpreter

When we write Python Code , What we get is a Python Code to .py Text file with extension . To run code , Need Python Interpreter to execute .py file .

( You translate for me , What do you mean python Code )

Cython

When we are from Python Official website Download and install Python after , We've got an official version of the interpreter ：CPython. This interpreter uses C Language development , So called CPython. Run on the command line python That is to start. CPython Interpreter .CPython Is the most widely used Python Interpreter .

although CPython Low efficiency , But if you use it to call C/C++ Code , The effect is very good . image numpy And so on , A lot of them use C/C++ Written . In this way, we can make use of python Simple grammar , Can be used again C/C++ Efficient execution speed . In some cases numpy Efficiency is better than writing by yourself C/C++ Still high , because numpy Take advantage of CPU Instruction set optimization and multi-core parallel computing .

What we are going to talk about today Python call C/C++, It's all based on CPython Of the interpreter .

IronPython

IronPython and Jython similar , It's just IronPython It's running at Microsoft .Net On the platform Python Interpreter , You can directly Python Code compiled into .Net Bytecode . The disadvantage is , because numpy And other commonly used libraries use C/C++ Compilation of , So in IronPython Call in numpy Third party libraries are very inconvenient . ( Now Microsoft has given up IronPython The update of )

Jython

Jython Is running on the Java On the platform Python Interpreter , You can directly Python Code compiled into Java Bytecode execution .Jython The advantage of is that it can call Java Related libraries , Bad heel IronPython equally .

PyPy

PyPy One is based on Python Interpreter , Also is to use python explain .py. Its goal is speed of execution .PyPy use JIT technology , Yes Python Dynamic compilation of code （ Notice it's not an explanation ）, So it can be significantly improved Python Code execution speed .

Why dynamic interpretation is slow

Suppose we have a simple python function

 def add(x, y):
return x + y

then CPython The execution is like this ( Pseudo code )

if instance_has_method(x, '__add__') {

// x.__add__ There are a lot of different types of y The judgment of the 
return call(x, '__add__', y);
} else if isinstance_has_method(super_class(x), '__add__' {

return call(super_class, '__add__', y);
} else if isinstance(x, str) and isinstance(y, str) {

return concat_str(x, y);
} else if isinstance(x, float) and isinstance(y, float) {

return add_float(x, y);
} else if isinstance(x, int) and isinstance(y, int) {

return add_int(x, y);
} else ...

because Python Dynamic type of , A simple function , You have to make many type judgments . It's not over yet. , You think it's a function that adds two integers , Namely C In language x + y Well ? No.

Python Everything in it is the object , actually Python Inside int It is probably such a structure ( Pseudo code ).

 struct {

prev_gc_obj *obj
next_gc_obj *obj
type IntType
value IntValue
... other fields
}

Every int They are all such structures , Or dynamically allocate it and put it in heap Upper , Inside value Not yet , That means you count 1000 This structure plus 1000 This structure , Need to be in heap in malloc come out 2000 This structure . After the calculation results are used up , There is also memory reclamation . ( Do so much , Speed is definitely not good )

therefore , If you can statically compile and execute + Specifies the type of the variable , Will greatly improve the execution speed .

Cython

What is? Cython

cython Is a new programming language , Its syntax is based on python, But there are some C/C++ The grammar of . for instance , cython Variable types can be specified in , Or use some C++ Inside stl library ( For example, use std::vector), Or call your own C/C++ function .

Be careful : Cython No CPython!

Native Python

We have a RawPython.py

from math import sqrt
import time
def func(n):
res = 0
for i in range(1, n):
res = res + 1.0 / sqrt(i)
return res
def main():
start = time.time()
res = func(30000000)
print(f"res = {
res}, use time {
time.time() - start:.5}")
if __name__ == '__main__':
main()

So let's use Python See how much time it takes to execute in native mode , On my computer it takes 4 second .

Compile operation Cython Program

First , Put one cython The program translates into .c/.cpp file , And then use C/C++ compiler , Compile to generate binary file . stay Windows Next , We need to install Visual Studio/mingw Etc . stay Linux or Mac Next , We need to install gcc, clang Etc .

adopt pip install cython

pip install cython

hold RawPython.py Rename it to RawPython1.pyx

Compiled words , There are two ways :

(1) use setup.py compile

Add one more setup.py, Add the following . here language_level It means , Use Python 3.

from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize('RawPython1.pyx', language_level=3)
)

hold Python Compile into binary code

python setup.py build_ext --inplace

then , We found that there are more in the current directory RawPython1.c( from .pyx Transformation generation ), and RawPython1.pyd( from .c Compile the generated binaries ).

(2) Compile directly from the command line ( With gcc For example )

cython RawPython1.pyx
gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python3.x -o RawPython1.so RawPython1.c

The first sentence is to put .pyx Turn it into .c, The second sentence is to use gcc compile + link .

In the current directory , function

python -c "import RawPython1; RawPython1.main()"

We can import compiled RawPython1 modular , And then in Python Call execution in .

From the execution results of the above steps , It didn't improve much , Only about twice as fast , This is because Python In addition to the explanation and execution, the most important reason for the slow running speed of is Python Is a dynamically typed language , Each variable does not know its type before running , So even if it is compiled into binary code, it will not be too fast , At this time, we need to use it deeply Cython Here it is Python Speed up , Is the use of Cython To specify the Python Data type of .

Speed up ! Speed up !

Specify the variable type

cython Are the benefits of , Can be like C The language is the same , Explicitly assign a type to a variable . therefore , We are cython The function of , Add the type of the loop variable .

then , use C In language sqrt Implement the square root operation .

 def func(int n):
cdef double res = 0
cdef int i, num = n
for i in range(1, num):
res = res + 1.0 / sqrt(i)
return res

however , python in math.sqrt Method , The return value is one Python Of float object , The efficiency is still low .

in order to , Can we use C Linguistic sqrt function ? Certainly. ~

Cython For some common C function /C++ Class is wrapped , Can be directly in Cython Call in .

Let's put the beginning

from math import sqrt

Switch to

from libc.math cimport sqrt

Then compile and run in the above way , I found that the speed has been improved a lot .

The complete code after transformation is as follows :

import time
from libc.math cimport sqrt
def func(int n):
cdef double res = 0
cdef int i, num = n
for i in range(1, num):
res = res + 1.0 / sqrt(i)
return res
def main():
start = time.time()
res = func(30000000)
print(f"res = {
res}, use time {
time.time() - start:.5}")
if __name__ == '__main__':
main()

Cython call C/C++

since C/C++ More efficient , Can we use cython call C/C++ Well ? Just use C Language to rewrite this function , And then in cython Call in .

First write a paragraph corresponding to C Language version

usefunc.h

#pragma once
#include <math.h>
double c_func(int n)
{

int i;
double result = 0.0;
for(i=1; i<n; i++)
result = result + sqrt(i);
return result;
}

then , We are Cython in , Import this header file , And then call this function

cdef extern from "usecfunc.h":
cdef double c_func(int n)
import time
def func(int n):
return c_func(n)
def main():
start = time.time()
res = func(30000000)
print(f"res = {
res}, use time {
time.time() - start:.5}")

stay Cython Use in numpy

stay Cython in , We can call numpy. however , If you access directly by array subscript , We also need dynamic judgment numpy Type of data , In this way, the efficiency is relatively low .

 import numpy as np
cimport numpy as np
from libc.math cimport sqrt
import time
def func(int n):
cdef np.ndarray arr = np.empty(n, dtype=np.float64)
cdef int i, num = n
for i in range(1, num):
arr[i] = 1.0 / sqrt(i)
return arr
def main():
start = time.time()
res = func(30000000)
print(f"len(res) = {
len(res)}, use time {
time.time() - start:.5}")

explain :

 cimport numpy as np

This sentence means , We can use numpy Of C/C++ Interface ( Specify the data type , Array dimension, etc ).

This sentence means , We can also use numpy Of Python Interface (np.array, np.linspace etc. ). Cython Deal with this ambiguity internally , This way, users don't need to use different names .

At compile time , We still need to revise it setup.py, introduce numpy The header file .

from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy as np
setup(ext_modules = cythonize(
Extension("RawPython4", ["RawPython4.pyx"],include_dirs=[np.get_include()],),
language_level=3)
)

Speed up ! Speed up !

The above code , It can be further accelerated

You can specify numpy Data types and dimensions of arrays , This eliminates the need to dynamically determine data types . The actual generated code , Is in accordance with the C Language to access by array subscript .
In the use of numpy Array time , You also need to check the array out of bounds at the same time . If we're sure our program won't cross the line , You can turn off array out of bounds detection .
Python Negative subscript access is also supported , That is, from the back to the front i individual . To do negative subscript access , Also need an extra if…else… To judge . If we don't use this feature , You can also turn it off .
Python And I will divide by 0 The inspection of , We don't divide by 0 Things about , Turn off the .
Relevant inspections are also turned off .

The final acceleration procedure is as follows :

import numpy as np
cimport numpy as np
from libc.math cimport sqrt
import time
cimport cython
@cython.boundscheck(False) # Turn off array subscript out of bounds 
@cython.wraparound(False) # Turn off negative indexing 
@cython.cdivision(True) # Turn off except 0 Check 
@cython.initializedcheck(False) # Close to check whether the memory view is initialized 
def func(int n):
cdef np.ndarray[np.float64_t, ndim=1] arr = np.empty(n, dtype=np.float64)
cdef int i, num = n
for i in range(1, num):
arr[i] = 1.0 / sqrt(i)
return arr
def main():
start = time.time()
res = func(30000000)
print(f"len(res) = {
len(res)}, use time {
time.time() - start:.5}")

cdef np.ndarray[np.float64_t, ndim=1] arr = np.empty(n, dtype=np.float64)

This sentence means , We created numpy Array time , Manually specify variable types and array dimensions .

The above is to turn off array subscript out of bounds for this function , Negative index , except 0 Check , Whether the memory view is initialized . We can also set it globally , That is to say .pyx The head of the document , Add notes

# cython: boundscheck=False
# cython: wraparound=False
# cython: cdivision=True
# cython: initializedcheck=False

It can also be written in this way :

with cython.cdivision(True):
# do something here

other

cython Absorbed a lot of C/C++ The grammar of , It also includes pointers and references . You can also put one struct/class from C++ Pass to Cython.

Cython summary

Cython Grammar and Python similar , At the same time, some C/C++ Characteristics of , For example, specify the variable type . meanwhile , Cython You can also call C/C++ Function of .

Cython It is characterized by , If the variable type is not specified , Execution efficiency is similar to Python almost . After specifying the type , The execution efficiency will be relatively high .

For more documents, please refer to Cython Official documents

Welcome to Cython’s Documentationdocs.cython.org/en/latest/index.html

pybind11

Cython Is a kind Python Language , however pybind11 Is based on C++ Of . We are .cpp Introduce in the file pybind11, Definition python Program entrance , Then compile and execute .

From the description on the official website pybind11 Several characteristics of

Lightweight header file library
Goals and grammar are similar to good Boost.python library
Used to python binding c++ Code

install

It can be executed pip install pybind11 install pybind11 ( Omnipotent pip)

It can also be used. Visual Studio + vcpkg+CMake To install .

A simple example

#include <pybind11/pybind11.h>
namespace py = pybind11;
int add_func(int i, int j) {

return i + j;
}
PYBIND11_MODULE(example, m) {

m.doc() = "pybind11 example plugin"; // Optional , Explain what this module does 
m.def("add_func", &add_func, "A function which adds two numbers");
}

First introduce pybind11 The header file , And then use PYBIND11_MODULE Statement .

example： The model name , Remember that quotation marks are not required . After that, you can go to python In the implementation of import example
m： It can be understood as a module object , For giving Python Provide the interface
m.doc()：help explain
m.def： Used to register functions and Python Break through the boundaries

m.def( " to python Call method name ", & The actual function , " Function function description " ). // The function description is optional

compile & function

pybind11 Only header files , So just add the corresponding header file to the code , You can use pybind11 了 .

#include <pybind11/pybind11.h>

stay Linux Next , You can execute such a command to compile :

 c++ -O3 -Wall -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) example.cpp -o example$(python3-config --extension-suffix)

We can also use setup.py To compile the ( stay Windows Next , need Visual Studio or mingw Etc ; stay Linux or Mac Next , need gcc or clang Etc )

from setuptools import setup, Extension
import pybind11
functions_module = Extension(
name='example',
sources=['example.cpp'],
include_dirs=[pybind11.get_include()],
)
setup(ext_modules=[functions_module])

Then run the following command , You can compile

python setup.py build_ext --inplace

stay python Call in the

python -c "import example; print(example.add_func(200, 33))"

stay pybind11 Specify function parameters in

Through simple code modification , You can inform Python Parameter name

m.def("add", &add, "A function which adds two numbers", py::arg("i"), py::arg("j"));

You can also specify default parameters

int add(int i = 1, int j = 2) {

return i + j;
}

stay PYBIND11_MODULE Specify default parameters in

m.def("add", &add, "A function which adds two numbers",py::arg("i") = 1, py::arg("j") = 2);

by Python Method to add variables

PYBIND11_MODULE(example, m) {

m.attr("the_answer") = 23333;
py::object world = py::cast("World");
m.attr("what") = world;
}

For strings , Need to use py::cast Convert it to Python object .

And then in Python in , You can visit the_answer and what object

import example
>>>example.the_answer
42
>>>example.what
'World'

stay cpp Call in file python Method

because python All things are objects , So we can use py::object To preserve Python The variables in the / Method / Module etc. .

 py::object os = py::module_::import("os");
py::object makedirs = os.attr("makedirs");
makedirs("/tmp/path/to/somewhere");

This is equivalent to Python Li implemented.

 import os
makedirs = os.makedirs
makedirs("/tmp/path/to/somewhere")

use pybind11 Use python list

We can pass it in directly python Of list

 void print_list(py::list my_list) {

for (auto item : my_list)
py::print(item);
}
PYBIND11_MODULE(example, m) {

m.def("print_list", &print_list, "function to print list", py::arg("my_list"));
}

stay Python Run this program in ,

 >>>import example
>>>result = example.print_list([2, 23, 233])
2
23
233
>>>print(result)

This function can also be used std::vector<int> As a parameter . Why can we do this ? pybind11 Can automatically put python list object , The copy construct is std::vector<int>. On the way back , And automatically put std::vector Turn into Python Medium list. The code is as follows :

 #include <pybind11/pybind11.h>
#include <pybind11/stl.h>
std::vector<int> print_list2(std::vector<int> & my_list) {

auto x = std::vector<int>();
for (auto item : my_list){

x.push_back(item + 233);
}
return x;
}
PYBIND11_MODULE(example, m) {

m.def("print_list2", &print_list2, "help message", py::arg("my_list"));
}

use pybind11 Use numpy

because numpy Work well , So if you can put numpy The array is passed as a parameter to pybind11, That would be very fragrant . The code is as follows ( For a long time )

 #include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {

py::buffer_info buf1 = input1.request(), buf2 = input2.request();
if (buf1.ndim != 1 || buf2.ndim != 1)
throw std::runtime_error("Number of dimensions must be one");
if (buf1.size != buf2.size)
throw std::runtime_error("Input shapes must match");
/* No pointer is passed, so NumPy will allocate the buffer */
auto result = py::array_t<double>(buf1.size);
py::buffer_info buf3 = result.request();
double *ptr1 = (double *) buf1.ptr,
*ptr2 = (double *) buf2.ptr,
*ptr3 = (double *) buf3.ptr;
for (size_t idx = 0; idx < buf1.shape[0]; idx++)
ptr3[idx] = ptr1[idx] + ptr2[idx];
return result;
}
m.def("add_arrays", &add_arrays, "Add two NumPy arrays");

The first numpy Take out the pointer of , Then operate on the pointer .

We are Python The tests are as follows :

 >>>import example
>>>import numpy as np
>>>x = np.ones(3)
>>>y = np.ones(3)
>>>z = example.add_arrays(x, y)
>>>print(type(z))
<class 'numpy.ndarray'>
>>>print(z)
array([2., 2., 2.])

To complete the code

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/numpy.h>
namespace py = pybind11;
int add_func(int i, int j) {

return i + j;
}
void print_list(py::list my_list) {

for (auto item : my_list)
py::print(item);
}
std::vector<int> print_list2(std::vector<int> & my_list) {

auto x = std::vector<int>();
for (auto item : my_list){

x.push_back(item + 233);
}
return x;
}
py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {

py::buffer_info buf1 = input1.request(), buf2 = input2.request();
if (buf1.ndim != 1 || buf2.ndim != 1)
throw std::runtime_error("Number of dimensions must be one");
if (buf1.size != buf2.size)
throw std::runtime_error("Input shapes must match");
/* No pointer is passed, so NumPy will allocate the buffer */
auto result = py::array_t<double>(buf1.size);
py::buffer_info buf3 = result.request();
double *ptr1 = (double *) buf1.ptr,
*ptr2 = (double *) buf2.ptr,
*ptr3 = (double *) buf3.ptr;
for (size_t idx = 0; idx < buf1.shape[0]; idx++)
ptr3[idx] = ptr1[idx] + ptr2[idx];
return result;
}
PYBIND11_MODULE(example, m) {

m.doc() = "pybind11 example plugin"; // Optional , Explain what this module does 
m.def("add_func", &add_func, "A function which adds two numbers");
m.attr("the_answer") = 23333;
py::object world = py::cast("World");
m.attr("what") = world;
m.def("print_list", &print_list, "function to print list", py::arg("my_list"));
m.def("print_list2", &print_list2, "help message", py::arg("my_list2"));
m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
}

pybind11 summary

pybind11 stay C++ Next use , It can be for Python The program offers C++ Interface . meanwhile , pybind11 Also supports incoming python list, numpy Objects such as .

For more documents, please refer to pybind11 Official documents

https://pybind11.readthedocs.io/en/stable/pybind11.readthedocs.io/en/stable/

Other uses python call C++ The way

CPython I'll bring one with me Python.h, We can do it in C/C++ This header file is introduced in , Then compile and generate the dynamic link library . however , Call directly Python.h There is a little trouble in writing .
boost It's a C++ library , Yes Python.h Do the packaging , But the whole boost The library is huge , And the related documents are not very friendly .
swig(Simplified Wrapper and Interface Generator), Declare in a specific syntax C/C++ function / Variable . ( Before tensorlfow That's what I use , But now it's changed to pybind11 了 )

summary : When should we accelerate

use Python The development is relatively simple , use C++ Development is a bit of a hassle .

Writing python when , We can go through Profile And other time-consuming analysis tools , Find out the code block when comparing , For this piece C++ To optimize . There is no need to optimize all the parts .

Cython or pybind11 Do only three things : Speed up , Speed up , Or speed up . It takes a lot of calculation , Time consuming places , We can use C/C++ To achieve , This will help to improve the whole Python The execution speed of the program .

Speed up python There are other ways , For example, use numpy Instead of the vectorization operation for loop , Use jit Instant compilation, etc .