Official document entry
If you need anything, please click here to enjoy
This section refers to Portal 1 and Portal 2
C/C++ Programs written in compiler languages like this , Convert the source file into the machine language used by the computer , The binary executable file is formed after being linked by the linker . When running the program , The binary program can be loaded from the hard disk into the memory and run . And in the Python As an explanatory language , There is no compile step , Instead, the interpreter converts the source code into bytecode , Then the interpreter executes these bytecodes , therefore Python Do not worry about the compilation of the program 、 Library link loading .
Python Interpreter ( Such as CPython、IPython、PyPy、Jython、IronPython) perform Python The four processes of the code
common cpython The interpreter uses c Language to interpret bytecode ,
and numba Is the use LLVM Compilation techniques to interpret bytecode .
We wrote before CPython In essence, it's through C To replace the compiler CPython Underlying complex code for acceleration ( such as Python Dynamic type of , It involves a lot of type checking 、 polymorphic 、 Overflow checking takes a lot of time , But if you use CPython There are not so many troublesome problems with static types of ),
and numba Their ideas are different ,numba It's in a place called LLVM Compile on the compiler of ,Numba take Python Bytecode to LLVM In the middle (IR), Please note that ,LLVM IR Is a low-level programming language , Similar to assembly Syntax , And Python irrelevant .
Numba yes Python A real-time compiler , It is most suitable for use NumPy Array 、 Functions and loops Code for . Use Numba The most common way is through its decorator collection , These decorators can be applied to your functions , To indicate that Numba Compile them . When calling a Numba When modifying a function , It is compiled into machine code for timely execution , And all or part of your code can then run at the speed of the native machine code .
It depends on what your code looks like , If your code is number oriented ( Did a lot of math work ), Regular use NumPy and / Or a lot of cycles , that Numba It's usually a good choice . In these examples , We will apply the most basic Numba Of JIT Decorator @jit To try to speed up some functions , To demonstrate what works , What's not working properly .
For the following code ,Numba Very effective
from numba import jit
import numpy as np
x = np.arange(100).reshape(10, 10)
@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit
def go_fast(a): # Function is compiled to machine code when called the first time
trace = 0
for i in range(a.shape[0]): # Numba likes loops
trace += np.tanh(a[i, i]) # Numba likes NumPy functions
return a + trace # Numba likes NumPy broadcasting
print(go_fast(x))
For the following code ,Numba Not much
from numba import jit
import pandas as pd
x = {
'a': [1, 2, 3], 'b': [20, 30, 40]}
@jit
def use_pandas(a): # Function will not benefit from Numba jit
df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
df += 1 # Numba doesn't understand what this is
return df.cov() # or this!
print(use_pandas(x))
@numba.jit(signature=None, nopython=False, nogil=False, cache=False, forceobj=False, parallel=False, error_model='python', fastmath=False, locals={
}, boundscheck=False)
jit All parameters in the decorator are optional , If only @jit, The system automatically determines how to optimize
First , Let's take a simple example
from numba import jit
@jit
def f(x, y):
# A somewhat trivial example
return x + y
here , Use f(10,3) and f(1j,2) Can run . however , If you need to control the data types of input parameters and output parameters ?
from numba import jit, int32, float32, double
@jit(double(float32, int32))
def f(x, y):
# A somewhat trivial example
return x + y
among ,double(float32, int32) Is the function signature ,double Control the data type of the output parameter ,float32 and int32 Each control x and y Data type of . The signature of the output parameter can be defaulted , There is a database for automatic judgment .
obviously , Use it again f(1j,2) It's a mistake , Realize the control of data type .
Common data signatures :
Both arguments are Boolean types ,nopython by True Indicates that... Is used at compile time nopython Pattern , and forceobj by True Said the use of object Pattern .
nopython Pattern : Build does not access Python C API Of the code Numba Compile mode . This compilation mode produces the highest performing code , But you need to be able to infer the native type of all the values in the function . Unless otherwise directed , Otherwise, if you cannot use nopython Pattern ,@jit The decorator will automatically return to object mode .
object Pattern : A kind of Numba Compile mode , It generates all values as Python Object processing code , And use Python C API Perform all operations on these objects . Code compiled in object mode is usually no better than Python Interpreted code runs faster , Unless Numba The compiler can take advantage of loops j.
In general , It is recommended to use nopython Pattern , After all, we use Numba The purpose of is to improve the running speed , However, there are corresponding restrictions on the coding specification .
if nogil by True Indicates that the global process lock is released , Thus, multi-core systems can be effectively utilized , But only in nopython Use... In mode . in addition , You should pay attention to the common pitfalls of multithreaded programming ( Uniformity 、 Sync 、 Competitive conditions, etc ).
if cache by True, Then cache enables file based caching , In order to reduce the compilation time when the function has been compiled in the previous call .
if parallel by True, Then you can automatically parallelize many common Numpy structure , And merge adjacent parallel operations , This maximizes the locality of the cache .
‘python’or’numpy’, Decide which library to throw an exception based on
fastmath Support use LLVM Unsafe floating point conversions described in the documentation . Besides , If Intel SVML Faster installation , But it uses an imprecise version of some of the mathematical internals .
Specify the type of a particular local variable
Whether to check the index of array boundary , It is recommended to set it to True, Avoid affecting speed
@numba.generated_jit(nopython=False, nogil=False, cache=False, forceobj=False, locals={
})
generated_jit() Decorators can determine different implementations of functions based on the type of parameters passed in , At the same time, it can guarantee jit() The speed of the decorator
# Returns whether the given value is a missing type
import numpy as np
from numba import generated_jit, types
@generated_jit(nopython=True)
def is_missing(x):
""" Return True if the value is missing, False otherwise. """
if isinstance(x, types.Float):
return lambda x: np.isnan(x)
elif isinstance(x, (types.NPDatetime, types.NPTimedelta)):
# The corresponding Not-a-Time value
missing = x('NaT')
return lambda x: x == missing
else:
return lambda x: False
@numba.vectorize(*, signatures=[], identity=None, nopython=True, target='cpu', forceobj=False, cache=False, locals={
})
Compile decorating functions , And package it as Numpy ufunc or Numba DUFunc
jitclass() Use... For all functions in the class nopython Mode to compile
import numpy as np
from numba import jitclass # import the decorator
from numba import int32, float32 # import the types
spec = [
('value', int32), # a simple scalar field
('array', float32[:]), # an array field
]
@jitclass(spec)
class Bag(object):
def __init__(self, value):
self.value = value
self.array = np.zeros(value, dtype=np.float32)
@property
def size(self):
return self.array.size
def increment(self, val):
for i in range(self.size):
self.array[i] = val
return self.array
cfunc() Create one that can be used externally C A compiled program called by language code , Thus, it can be used with C or C++ Write a library to interact with . Considering a lot of Python The bottom layer of the library is C or C++, This function is very useful .
for example ,scipy.integrate.quad Function can accept normal Python Callback , Packaging in... Is also acceptable ctypes In the callback object C Callback .
Use ordinary Python Callback
import numpy as np
import scipy.integrate as si
def integrand(t):
return np.exp(-t) / t ** 2
def do_integrate(func):
""" Integrate the given function from 1.0 to +inf. """
return si.quad(func, 1, np.inf)
do_integrate(integrand)
Use cfunc() Decorator
import numpy as np
import scipy.integrate as si
from numba import cfunc
def integrand(t):
return np.exp(-t) / t ** 2
def do_integrate(func):
""" Integrate the given function from 1.0 to +inf. """
return si.quad(func, 1, np.inf)
nb_integrand = cfunc("float64(float64)")(integrand)
do_integrate(nb_integrand.ctypes)
Part of the error reporting results from the mismatching of data types , Adjust according to the error report ; Another part of the error reporting may come from numba Do not host some functions , Please refer to the document for details , Supported by python Characteristic link , Supported by numpy Characteristic link
To recommend SVD Algorithm as an example , Exhibition numba Library speed increase .
import numpy as np
import time
import pandas as pd
from numba import jit, prange
@jit(nopython=True, cache=True, nogil=True, parallel=True)
def svd(users, items, iterations, lr, reg, factors, avg, data):
# initialization
bu = np.random.normal(loc=0, scale=0.1, size=(users, 1))
bi = np.random.normal(loc=0, scale=0.1, size=(items, 1))
p = np.random.normal(loc=0, scale=0.1, size=(users, factors))
q = np.random.normal(loc=0, scale=0.1, size=(items, factors))
# iteration
for iteration in prange(iterations):
# error use: for u, i, r in trainset:
for line in prange(data.shape[0]):
u, i, r = data[line]
rp = avg + bu[u] + bi[i] + np.dot(q[i], p[u])
e_ui = r - rp
bu[u] += lr * (e_ui - reg * bu[u])
bi[i] += lr * (e_ui - reg * bi[i])
p[u] += lr * (e_ui * q[i] - reg * p[u])
q[i] += lr * (e_ui * p[u] - reg * q[i])
nUsers = 100 # number of users
nItems = 100 # number of items
iteration = 30 # number of iterations
lr = 0.01 # learning rate
reg = 0.002 # regularization rate
factor = 5 # number of factors
trainset = pd.read_csv("D:/py3/trainset.txt", sep=' ', header=None).values
aver = np.mean(trainset[:, 2]) # average rating
start = time.clock()
svd(nUsers, nItems, iteration, lr, reg, factor, aver, trainset)
end = time.clock()
print("training time: %s seconds" % (end - start))
If you remove @jit(nopython=True, cache=True, nogil=True), I don't use numba The result of acceleration is
training time: 17.564660734381764 seconds
If you use numba Speed up , The first time you run it, you need to compile
training time: 9.588356848133849 seconds
after , Run again , It can be stabilized in
training time: 0.18296860820512134 seconds
numba Improved in speed 96 times , Generally speaking numba It can be increased by one or two orders of magnitude .