Concurrency in Python
Checkout the wonderful material provided by Ranmond Hettinger.
Prior Knowledge
Global Interpreter Lock (GIL)
CPython has a lock for its internal shared global state.
The unfortunate effect of the GIL is that no more than one thread can run at a time.
I/O bound: no issue with GIL; CPU bound: threading makes application speed worse which makes multi-processing a better way;
A problem is I/O bound if it would go faster if the I/O sybsystem was faster, e.g. disk, networking, communication. For example, a program that looks through a huge file for some data might become I/O bound since the bottleneck is then the reading of the data from disk.
A problem is CPU bound if it would go faster with the CPU being faster, i.e. it spends the majority of its time simply using the CPU (doing calculations). For example, a program that computes new digits of
pi
will typically be CPU bound.
Threads v.s. Processes
Both strength and weakness of threads is shared state, i.e. managing race conditions.
Race condition occurs when multiple threads can access the shared data and are trying to change it at the same time.
The strength of processes is their independence from one another. The weakness is lack of communication.
Threads v.s. Async
For threads, critical sections have to be guarded with locks.
Async switches cooperatively (explicitly). So locks and other synchronization are no longer needed.
Rule of Thumb on What to Use
- Async maximizes CPU utilization because it has less overhead than threads.
- Threading typically works with existing code and tools as long as locks are added around critical sections.
For complex systems, async is much easier to get right than threads with locks.
Threads require very little tooling (locks and queues).
- Async needs a great deal of tooling (futures, event loops, and non-blocking versions of just about everything).
What is unblocking?
Threading
Check out official documentation of threading module.
Thread Objects
Once a thread object is created, its activity must be started by calling the thread's start()
method which invokes the run()
method.
A thread can be flagged as a "daemon thread". The significance of this flag is that the entire Python program exits when only daemon threads are left.
Combining Threading and Forking
Wiki on
Fork-join model
An interesting blog on forking
Start methods, e.g. fork, spawn ...
The general rule is thread after you fork not before. Otherwise, the locks used by the thread executor will get duplicated across processes. If one of those processes dies while it has the lock, all of the other processes using that lock will deadlock.
fork()
copies everything in memory.fork()
does not copy everything, for example, threads.
Async
Here is a wonderful blog on asynchronous Python.
Building Blocks
- event loop manages and distributes the execution of different tasks.
- coroutines are special functions that work similarly to Python generators, on await they release the flow of control back to the event loop. A coroutine needs to be run on the event loop, once scheduled coroutines are wrapped in Tasks which is a type of Future.
- futures represent the result of a task that may or may not have been executed. This result may be an exception.
Multiprocessing
Pickle
Read official documentation on what can and cannot be pickled.
Objects cannot be pickled for example,
- inner function / closure,
- inner class,
- lambda function,
dill
package can be used to check if objects can be pickled,
import dill
dill.pickles(object) # return True if picklable
Handle Stateful Objects
If pickling some attributes of an instance is too expensive, it is recommended to delete them from __dict__
(by __getstate__
) and then recover later. Recovery can be done in __setstate__
method or another user defined method.
Read gluon.data.dataloader
Dive deep into real world codes and learn. What are Queue
, SimpleQueue
and Pipe
?
- Prefetch
Execute _push_next()
many times to push (sent_idx, idx_list)
to key_queue
.
- Produce
Get item from key_queue
and generate batch, then put (sent_idx, batch)
into data_queue
.
- Consume
Fetch item from data_queue
and push it into data_buffer
.
Use multiple processing to prepare data_buffer
and use single thread to read from it.