Thursday, September 27, 2012

Using Map() or Apply_async() For Multiprocessing


With the release of this new python article on the ArcGIS blogs.  I find it increasingly harder to find concrete/simple explanations of important functions that help make life easier for processing data in ArcGIS.
Though the crux of this article is to give guidance to when or when not to use multiprocessing with ArcGIS, it does use a variation on a method of multiprocessing different than I prefer.  So I decided to investigate the differences between map() and apply_async() on the Pool object.
Pool.apply_async() is also like Python's built-in apply(), except that the call returns immediately instead of waiting for the result. An ApplyResult object is returned. You call its get() method to retrieve the result of the function call. The get() method blocks until the function is completed. Apply_async() does not preserve the order of the results meaning that the codes return results will not be a 1:1 correlation between when it was called and when it finished.  In addition, apply_async() allows coders to call multiple functions instead of a single function.
Pool.map() is like the Python's built-in map() function.  Pool.map() applies the same function to many arguments. The results are returned in an order corresponding to the order of the arguments.

When it comes right down to it, you basically need to find the right tool for the right job.

Here are some resources to help you along.

Enjoy

1 comment:

Unknown said...

Yes, it seems there are more examples slowly becoming available for ArcGIS and Python multiprocessing. I've started implementing it in some of my scripts but not for either reason mentioned in the esri blog. I use the multiprocessing.Queue, multiprocessing.Process to create a job and then use job.start(). It works for me. Just my two cents.