Concurrency in Python

Checkout the wonderful material provided by Ranmond Hettinger.

Prior Knowledge

Global Interpreter Lock (GIL)

CPython has a lock for its internal shared global state.

The unfortunate effect of the GIL is that no more than one thread can run at a time.

I/O bound: no issue with GIL; CPU bound: threading makes application speed worse which makes multi-processing a better way;

A problem is I/O bound if it would go faster if the I/O sybsystem was faster, e.g. disk, networking, communication. For example, a program that looks through a huge file for some data might become I/O bound since the bottleneck is then the reading of the data from disk.

A problem is CPU bound if it would go faster with the CPU being faster, i.e. it spends the majority of its time simply using the CPU (doing calculations). For example, a program that computes new digits of pi will typically be CPU bound.

Threads v.s. Processes

Both strength and weakness of threads is shared state, i.e. managing race conditions.

Race condition occurs when multiple threads can access the shared data and are trying to change it at the same time.

The strength of processes is their independence from one another. The weakness is lack of communication.

Threads v.s. Async

For threads, critical sections have to be guarded with locks.

Async switches cooperatively (explicitly). So locks and other synchronization are no longer needed.

Rule of Thumb on What to Use

  • Async maximizes CPU utilization because it has less overhead than threads.
  • Threading typically works with existing code and tools as long as locks are added around critical sections.
  • For complex systems, async is much easier to get right than threads with locks.

  • Threads require very little tooling (locks and queues).

  • Async needs a great deal of tooling (futures, event loops, and non-blocking versions of just about everything).

What is unblocking?

Threading

Check out official documentation of threading module.

Thread Objects

Once a thread object is created, its activity must be started by calling the thread's start() method which invokes the run() method.

A thread can be flagged as a "daemon thread". The significance of this flag is that the entire Python program exits when only daemon threads are left.

Combining Threading and Forking

Wiki on Fork-join model

An interesting blog on forking

Start methods, e.g. fork, spawn ...

Linux Programmer's Manual on FORK

The general rule is thread after you fork not before. Otherwise, the locks used by the thread executor will get duplicated across processes. If one of those processes dies while it has the lock, all of the other processes using that lock will deadlock.

  • fork() copies everything in memory.
  • fork() does not copy everything, for example, threads.

Async

Here is a wonderful blog on asynchronous Python.

Building Blocks

  • event loop manages and distributes the execution of different tasks.
  • coroutines are special functions that work similarly to Python generators, on await they release the flow of control back to the event loop. A coroutine needs to be run on the event loop, once scheduled coroutines are wrapped in Tasks which is a type of Future.
  • futures represent the result of a task that may or may not have been executed. This result may be an exception.

Multiprocessing

Pickle

Read official documentation on what can and cannot be pickled.

Objects cannot be pickled for example,

  • inner function / closure,
  • inner class,
  • lambda function,

dill package can be used to check if objects can be pickled,

import dill
dill.pickles(object) # return True if picklable

Handle Stateful Objects

If pickling some attributes of an instance is too expensive, it is recommended to delete them from __dict__ (by __getstate__) and then recover later. Recovery can be done in __setstate__ method or another user defined method.

Read gluon.data.dataloader

Dive deep into real world codes and learn. What are Queue, SimpleQueue and Pipe?

  • Prefetch

Execute _push_next() many times to push (sent_idx, idx_list) to key_queue.

  • Produce

Get item from key_queue and generate batch, then put (sent_idx, batch) into data_queue.

  • Consume

Fetch item from data_queue and push it into data_buffer.

Use multiple processing to prepare data_buffer and use single thread to read from it.

results matching ""

    No results matching ""