Catalog
map
apply
map_async and apply_async
imap and imap_unordered
starmap and starmap_async
python Multi process standard library multiprocessing There are several ways to start multiple processes through the process pool , The purpose of this article is to explain the differences between these methods :map、apply、map_async、apply_async、imap、imap_unordered、starmap、starmap_async.
map Function signature :map
(func, iterable[, chunksize])
From the function signature ,map Will automatically func Functions are applied in turn to iterable On each parameter in , Thus, multiple parallel func Mission , How many tasks are in parallel depends on how many processes are specified when creating the process pool .map Among the multiple subtasks of , Once an exception occurs , Then all the results will not be available , But other subtasks will work properly ( If chunksize Not for 1, So in order to chunk In units of , Some chunk Something unusual happened , The chunk Will stop immediately , But other chunk It's going to work , Just still can't get the result ).
If map Function is set chunksize, that map Will be able to iterable Parameters are divided equally , The size of each serving is equal to chunksize, It should be noted that , Every chunk Will be assigned to a process for sequential execution , This may cause a process in the process pool to actually finish processing its own chunk It is idle , But you can't get what other processes haven't finished processing chunk The task , The processes that cause these early terminations are idle , Waste resources . Of course , If not set chunksize Parameters , Then the default is yes iterable The parameters in are processed one by one , namely chunksize be equal to 1, So once a process has finished processing a task , Will immediately get the next parameter to continue processing , Will not be idle ; But the drawback is , If iterable Parameter too long , This will cause frequent context switching for processes , Reduce efficiency , And set chunksize Passing parameters to the process at once can avoid frequent switching .
iterable Too long parameters may also cause excessive memory consumption , because map Can handle the iterable The results of the corresponding subtasks are saved in one list in , In this way, a very long list, May consume too much memory . In this case , You can use the following imap/imap_unordered, combining chunksize Parameters , Can significantly improve efficiency .
apply Function signature :apply
(func[, args[, kwds]])
apply Just to func Function is added to the process as a task and executes ,args and kwds Is an optional pass to func Position parameter and keyword parameter of , This function can only run one task at a time , You cannot run the next task until it is finished , therefore ,apply In fact, it is not possible to use subprocesses for multi task parallelism .
map_async Function signature :map_async
(func, iterable[, chunksize[, callback[, error_callback]]])
apply_async Function signature :apply_async
(func[, args[, kwds[, callback[, error_callback]]]])
Compared with map and apply, These two functions do not block the main process . If in the program , Want to block progress until the end of task execution , Then you can call pool Of join Method , This method blocks the child processes until all tasks are finished ; It should be noted that , Calling join Before method , You need to call pool Of close Method , Indicates that the task has been added , such join Is allowed to call .
Both functions return AsyncResult object ,AsyncResult Object's get Function to get the running result . about map_async function ,AsyncResult Object's get Function gets a list containing the results of multiple tasks , And the order of results and input iterable In the same order . about apply_async, Their corresponding AsyncResult Of get The function returns func The result returned by the function .
AsyncResult Of get Function blocks the main process , Until all the tasks of the subprocess are executed . namely map_async(func,iterable).get() Equivalent to map(func,iterable),apply_async(func,args).get() Equivalent to apply(func,args).
imap yes map Of lazy edition , That is, the results of all subtasks will not be generated at one time and there is a list Back in , Instead, it returns an iterator , Wait for the main process to actively iterate , And then return the result for further processing , Instead of waiting for the results of all subtasks to be generated before processing , This can greatly reduce memory usage ; about iterable The longer case , combination chunksize Reasonable use of , Compared with map Can significantly improve efficiency .
however imap There is a problem , The result returned by the iterator is order preserving , namely , Even though iterable Parameters in , The subtask corresponding to the later parameters has been completed, and the results can be returned for the next step , However, if the task corresponding to the first parameter takes a long time , Then the main process will continue to block , You can't deal with the results that have already been generated . The iterator will only iterate sequentially , And only wait until the subtask corresponding to the iteration parameters is completed , To move on to the next step , Otherwise, it will continue to block .
For the above imap The potential memory consumption caused by iterating through the results in sequence and cpu Waste of resources , Can pass imap_unordered solve .imap_unordered Compared with imap, It does not require an iterator to preserve order , As long as a subtask ends and returns a result , The iterator is immediately ready for the main process to fetch and proceed to the next step , Instead of blocking a process because of sequential iteration .
comparison map and map_async,starmap and starmap_async The only difference is the requirement iterable The elements in are also iteratable , The iteratable elements are unpacked into multiple position parameters and passed to func. That is, if iterable yes [(1,2),(3,4)], So the subtasks are func(1,2) and func(3,4).