程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

what? Python too slow? Try the numba library!

編輯:Python

what ?Python Too slow ? try Numba Kuba !

  • Official documents
  • Python Compilation process and execution principle
  • Numba brief introduction
  • Numba When is it effective
  • @jit Decorator
    • signature Parameters ( Data type control )
    • nopython、forceobj Parameters ( Compilation mode selection )
    • nogil Parameters ( Global process lock limits )
    • cache Parameters ( Save as file cache )
    • parallel Parameters ( Parallelization parameters )
    • error_model Parameters
    • fastmath Parameters
    • locals Parameters
    • boundscheck Parameters
  • @generated_jit() Decorator
  • @vectorize() Decorator
  • @jitclass() Decorator
  • @cfunc() Decorator
  • Writing specifications
  • Examples of performance comparisons :SVD Algorithm

Official documents

Official document entry
If you need anything, please click here to enjoy

Python Compilation process and execution principle

This section refers to Portal 1 and Portal 2
C/C++ Programs written in compiler languages like this , Convert the source file into the machine language used by the computer , The binary executable file is formed after being linked by the linker . When running the program , The binary program can be loaded from the hard disk into the memory and run . And in the Python As an explanatory language , There is no compile step , Instead, the interpreter converts the source code into bytecode , Then the interpreter executes these bytecodes , therefore Python Do not worry about the compilation of the program 、 Library link loading .

Python Interpreter ( Such as CPython、IPython、PyPy、Jython、IronPython) perform Python The four processes of the code

  1. Lexical analysis
    Check whether the keywords are correct
  2. Syntax analysis
    Whether the statement format is correct
  3. Generated bytecode
    Generate .pyc file (PyCodeObject object ). In compiling the code , First, the functions in the code 、 Class and other object classification processing , Then generate bytecode file .
  4. perform
    Python The interpreter interprets the bytecode , Interpret the bytecode of each line as CPU Machine code that can be directly recognized , perform

common cpython The interpreter uses c Language to interpret bytecode ,
and numba Is the use LLVM Compilation techniques to interpret bytecode .

We wrote before CPython In essence, it's through C To replace the compiler CPython Underlying complex code for acceleration ( such as Python Dynamic type of , It involves a lot of type checking 、 polymorphic 、 Overflow checking takes a lot of time , But if you use CPython There are not so many troublesome problems with static types of ),

and numba Their ideas are different ,numba It's in a place called LLVM Compile on the compiler of ,Numba take Python Bytecode to LLVM In the middle (IR), Please note that ,LLVM IR Is a low-level programming language , Similar to assembly Syntax , And Python irrelevant .

Numba brief introduction

Numba yes Python A real-time compiler , It is most suitable for use NumPy Array 、 Functions and loops Code for . Use Numba The most common way is through its decorator collection , These decorators can be applied to your functions , To indicate that Numba Compile them . When calling a Numba When modifying a function , It is compiled into machine code for timely execution , And all or part of your code can then run at the speed of the native machine code .

Numba When is it effective

It depends on what your code looks like , If your code is number oriented ( Did a lot of math work ), Regular use NumPy and / Or a lot of cycles , that Numba It's usually a good choice . In these examples , We will apply the most basic Numba Of JIT Decorator @jit To try to speed up some functions , To demonstrate what works , What's not working properly .

For the following code ,Numba Very effective

from numba import jit
import numpy as np
x = np.arange(100).reshape(10, 10)
@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit
def go_fast(a): # Function is compiled to machine code when called the first time
trace = 0
for i in range(a.shape[0]): # Numba likes loops
trace += np.tanh(a[i, i]) # Numba likes NumPy functions
return a + trace # Numba likes NumPy broadcasting
print(go_fast(x))

For the following code ,Numba Not much

from numba import jit
import pandas as pd
x = {
'a': [1, 2, 3], 'b': [20, 30, 40]}
@jit
def use_pandas(a): # Function will not benefit from Numba jit
df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
df += 1 # Numba doesn't understand what this is
return df.cov() # or this!
print(use_pandas(x))

@jit Decorator

@numba.jit(signature=None, nopython=False, nogil=False, cache=False, forceobj=False, parallel=False, error_model='python', fastmath=False, locals={
}, boundscheck=False)

jit All parameters in the decorator are optional , If only @jit, The system automatically determines how to optimize

signature Parameters ( Data type control )

First , Let's take a simple example

from numba import jit
@jit
def f(x, y):
# A somewhat trivial example
return x + y

here , Use f(10,3) and f(1j,2) Can run . however , If you need to control the data types of input parameters and output parameters ?

from numba import jit, int32, float32, double
@jit(double(float32, int32))
def f(x, y):
# A somewhat trivial example
return x + y

among ,double(float32, int32) Is the function signature ,double Control the data type of the output parameter ,float32 and int32 Each control x and y Data type of . The signature of the output parameter can be defaulted , There is a database for automatic judgment .
obviously , Use it again f(1j,2) It's a mistake , Realize the control of data type .

Common data signatures :

  • void Indicates that the return value is null
  • intp and uintp Express pointer-sized Integers ( Signed and unsigned, respectively )
  • intel and uintc amount to C Medium int and unsigned int
  • int8 uint8, int16, uint16, int32, uint32, int64, uint64 Is the integer bit width of the corresponding fixed width
  • float32 float64 single 、 Double precision floating point
  • complex64 and complex128 Is a complex number of single precision and double precision
  • An array type , Such as float32[:] and int8[:,:]

nopython、forceobj Parameters ( Compilation mode selection )

Both arguments are Boolean types ,nopython by True Indicates that... Is used at compile time nopython Pattern , and forceobj by True Said the use of object Pattern .

nopython Pattern : Build does not access Python C API Of the code Numba Compile mode . This compilation mode produces the highest performing code , But you need to be able to infer the native type of all the values in the function . Unless otherwise directed , Otherwise, if you cannot use nopython Pattern ,@jit The decorator will automatically return to object mode .

object Pattern : A kind of Numba Compile mode , It generates all values as Python Object processing code , And use Python C API Perform all operations on these objects . Code compiled in object mode is usually no better than Python Interpreted code runs faster , Unless Numba The compiler can take advantage of loops j.

In general , It is recommended to use nopython Pattern , After all, we use Numba The purpose of is to improve the running speed , However, there are corresponding restrictions on the coding specification .

nogil Parameters ( Global process lock limits )

if nogil by True Indicates that the global process lock is released , Thus, multi-core systems can be effectively utilized , But only in nopython Use... In mode . in addition , You should pay attention to the common pitfalls of multithreaded programming ( Uniformity 、 Sync 、 Competitive conditions, etc ).

cache Parameters ( Save as file cache )

if cache by True, Then cache enables file based caching , In order to reduce the compilation time when the function has been compiled in the previous call .

parallel Parameters ( Parallelization parameters )

if parallel by True, Then you can automatically parallelize many common Numpy structure , And merge adjacent parallel operations , This maximizes the locality of the cache .

error_model Parameters

‘python’or’numpy’, Decide which library to throw an exception based on

fastmath Parameters

fastmath Support use LLVM Unsafe floating point conversions described in the documentation . Besides , If Intel SVML Faster installation , But it uses an imprecise version of some of the mathematical internals .

locals Parameters

Specify the type of a particular local variable

boundscheck Parameters

Whether to check the index of array boundary , It is recommended to set it to True, Avoid affecting speed

@generated_jit() Decorator

@numba.generated_jit(nopython=False, nogil=False, cache=False, forceobj=False, locals={
})

generated_jit() Decorators can determine different implementations of functions based on the type of parameters passed in , At the same time, it can guarantee jit() The speed of the decorator

# Returns whether the given value is a missing type 
import numpy as np
from numba import generated_jit, types
@generated_jit(nopython=True)
def is_missing(x):
""" Return True if the value is missing, False otherwise. """
if isinstance(x, types.Float):
return lambda x: np.isnan(x)
elif isinstance(x, (types.NPDatetime, types.NPTimedelta)):
# The corresponding Not-a-Time value
missing = x('NaT')
return lambda x: x == missing
else:
return lambda x: False

@vectorize() Decorator

@numba.vectorize(*, signatures=[], identity=None, nopython=True, target='cpu', forceobj=False, cache=False, locals={
})

Compile decorating functions , And package it as Numpy ufunc or Numba DUFunc

@jitclass() Decorator

jitclass() Use... For all functions in the class nopython Mode to compile

import numpy as np
from numba import jitclass # import the decorator
from numba import int32, float32 # import the types
spec = [
('value', int32), # a simple scalar field
('array', float32[:]), # an array field
]
@jitclass(spec)
class Bag(object):
def __init__(self, value):
self.value = value
self.array = np.zeros(value, dtype=np.float32)
@property
def size(self):
return self.array.size
def increment(self, val):
for i in range(self.size):
self.array[i] = val
return self.array

@cfunc() Decorator

cfunc() Create one that can be used externally C A compiled program called by language code , Thus, it can be used with C or C++ Write a library to interact with . Considering a lot of Python The bottom layer of the library is C or C++, This function is very useful .
for example ,scipy.integrate.quad Function can accept normal Python Callback , Packaging in... Is also acceptable ctypes In the callback object C Callback .
Use ordinary Python Callback

import numpy as np
import scipy.integrate as si
def integrand(t):
return np.exp(-t) / t ** 2
def do_integrate(func):
""" Integrate the given function from 1.0 to +inf. """
return si.quad(func, 1, np.inf)
do_integrate(integrand)

Use cfunc() Decorator

import numpy as np
import scipy.integrate as si
from numba import cfunc
def integrand(t):
return np.exp(-t) / t ** 2
def do_integrate(func):
""" Integrate the given function from 1.0 to +inf. """
return si.quad(func, 1, np.inf)
nb_integrand = cfunc("float64(float64)")(integrand)
do_integrate(nb_integrand.ctypes)

Writing specifications

Part of the error reporting results from the mismatching of data types , Adjust according to the error report ; Another part of the error reporting may come from numba Do not host some functions , Please refer to the document for details , Supported by python Characteristic link , Supported by numpy Characteristic link

Examples of performance comparisons :SVD Algorithm

To recommend SVD Algorithm as an example , Exhibition numba Library speed increase .

import numpy as np
import time
import pandas as pd
from numba import jit, prange
@jit(nopython=True, cache=True, nogil=True, parallel=True)
def svd(users, items, iterations, lr, reg, factors, avg, data):
# initialization
bu = np.random.normal(loc=0, scale=0.1, size=(users, 1))
bi = np.random.normal(loc=0, scale=0.1, size=(items, 1))
p = np.random.normal(loc=0, scale=0.1, size=(users, factors))
q = np.random.normal(loc=0, scale=0.1, size=(items, factors))
# iteration
for iteration in prange(iterations):
# error use: for u, i, r in trainset:
for line in prange(data.shape[0]):
u, i, r = data[line]
rp = avg + bu[u] + bi[i] + np.dot(q[i], p[u])
e_ui = r - rp
bu[u] += lr * (e_ui - reg * bu[u])
bi[i] += lr * (e_ui - reg * bi[i])
p[u] += lr * (e_ui * q[i] - reg * p[u])
q[i] += lr * (e_ui * p[u] - reg * q[i])
nUsers = 100 # number of users
nItems = 100 # number of items
iteration = 30 # number of iterations
lr = 0.01 # learning rate
reg = 0.002 # regularization rate
factor = 5 # number of factors
trainset = pd.read_csv("D:/py3/trainset.txt", sep=' ', header=None).values
aver = np.mean(trainset[:, 2]) # average rating
start = time.clock()
svd(nUsers, nItems, iteration, lr, reg, factor, aver, trainset)
end = time.clock()
print("training time: %s seconds" % (end - start))

If you remove @jit(nopython=True, cache=True, nogil=True), I don't use numba The result of acceleration is

training time: 17.564660734381764 seconds

If you use numba Speed up , The first time you run it, you need to compile

training time: 9.588356848133849 seconds

after , Run again , It can be stabilized in

training time: 0.18296860820512134 seconds

numba Improved in speed 96 times , Generally speaking numba It can be increased by one or two orders of magnitude .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved