Python Coroutines and the Cloud

jetpack.io

Jun 24, 2022 • 7 min read

If you commonly use Python, you’ve probably noticed one of the biggest disadvantages of the language, it has no true parallelism. Python’s Global Interpreter Lock (GIL) prevents true parallelism. However, we can achieve a level of concurrency and virtual parallel computing using threads, processes, and/or coroutines. In this post, we’ll cover how we can achieve concurrency using coroutines.

We’ll cover:

What Coroutines Are
What Concurrency is
A History of Coroutines in Python
Multiprocessing in Python
Threading in Python
Cooperative Multitasking
Coroutines from Generators
When We Should Use Coroutines
How to Create Your Own Coroutines
A Summary of Python Coroutines
How Coroutines Are Used in Jetpack

What Are Coroutines?

visual of running three coroutines concurrently, notice processing for a coroutine only happens while other tasks are waiting for external resources — Visual of running three coroutines concurrently

Let’s say you want to send and receive multiple web requests. When you’re running a function that does that, it only needs processing time for part of the runtime. For another part of the runtime, you’re waiting on the server to respond to your request. The most efficient way to switch which function is running and which functions are waiting is to use coroutines.

Coroutines are special functions used to achieve concurrency in Python via cooperative multitasking. Unlike threads and processes, coroutines do not require duplications of the heap and stack. You can use coroutines to run multiple functions that need to wait for I/O on the same thread. Coroutines have to be able to track when each coroutine is waiting for I/O or processing time.

A History of Coroutines in Python

The most basic way to run multiple tasks is to run them in order, however, this strategy is hugely time inefficient. We can speed up our processing time and increase our task throughput by adopting strategies to run tasks at the same time. These strategies include using multiple processes, multiple threads, and coroutines.

Coroutines are a sophisticated way to achieve concurrency. They are defined in Python through the asyncio library and defined using the async and await keywords. Python has been evolving coroutines over a long period of time, with their first introduction in Python 2.5.

Multiprocessing in Python

The most basic strategy to achieve parallel runtimes for executing multiple tasks is through using multiple workers and processes, which we can do in Python with the `multiprocessing` library. We can create multiple workers that will each execute a single task or sets of tasks in series. This results in increased throughput that varies directly with the number of workers running.

Although we can scale multiple processes, we are still left with the problem of idle time. Each task, such as the web request example above, still has idle time where processing power is unused. Not only is there still idle time, but we have to serialize and deserialize data when communicating between processes. To add to that, we have to duplicate the heap, stack, and compiled byte code for each process.

Threading in Python

Another approach to the multiple worker solution is to use multiple threads instead of multiple processes, which we can do in Python with the `threading` library. Using threads instead of processes provides a space complexity advantage. We no longer have to duplicate the heap or byte code for each added worker, we just need a new stack. This also means we don’t need to serialize or deserialize data when communicating between tasks.

However, thanks to GIL, we can only run one thread at a time. This means that the other workers have to sit idle while one thread is running. We can’t achieve true parallelism the same way we can with processes this way.

On top of this, the runtime is in charge of scheduling threads, but it has no insight into what each thread is doing and switching between threads is costly. Every time we switch between threads we have to save the current stack and load the stack of the next thread. This is called preemptive multitasking. The drawback here is that we are likely to switch contexts in the middle of processing instead of wait time, resulting in suboptimal behavior.

Cooperative Multitasking

It’s clear that neither using threads nor using processes results in optimal concurrency behavior, so what does? To achieve optimal concurrency, we want cooperative multitasking.

Cooperative multitasking is achieved when tasks run through their critical processing phases and only switch to another task when waiting on external resources or are done processing. The tradeoff for using cooperative multitasking is that it only works when each task is well mannered and cooperating with all other tasks to share CPU time and resources.

We can create tasks that are fit for cooperative multitasking by making sure they yield control when waiting on external resources. Building cooperative tasks that follow this template allows us to run tasks concurrently on one thread and one process. Once we have cooperative tasks, like this, we can run them concurrently using an event loop.

However, this basic template is incomplete. Each task has to keep track of its process and design it’s run implementation to start from the beginning each time. It’s also difficult to call other functions that may also need to wait on resources. This is why we have generators in Python. Generators are functions that yield multiple values over time and start executing where they left off.

The analogous behavior of waiting for external resources can be seen in generators using the `send` method, which allows us to send values into the generator object until it completes. Voila, we’ve just discovered coroutines. In fact, the first official appearance of coroutines in Python is outlined in PEP 342, Coroutines via Enhanced Generators with the introduction of the `send` method.

When Should We Use Coroutines?

Functional Description	Should it be a coroutine?
Sending web requests	Yes
Multiplying numbers	No
Movements in a video game - when you’re doing combat in a multiplayer game, you have to make multiple network calls to get the stats for each player concurrently and then the calculation and updates	Yes

In distributed systems, we can use coroutines to speed up concurrent I/O operations, such as API requests or database reads/writes. For example, if you had a service that required the output of multiple Kubernetes jobs or tasks, you could use coroutines to run the jobs concurrently in your cluster.

How to Create Your Own Coroutines

In Python, coroutines are handled using the async and await keywords from the asyncio library. As is consistent with Python’s syntax style, these words make perfect sense for defining coroutine functions.

The async keyword in front of a function definition marks the function as a coroutine. The await keyword in front of a line of code in the function marks a place to yield control and wait to be scheduled. Let’s take a look at a naive example of coroutines by running two “Hello World” functions concurrently.

We’ll first need to create a hello_world function. The function will simulate waiting for I/O through using the asyncio.sleep() function. Our hello_world coroutine will print “Hello”, await or yield control of the thread, and then print “World”. We’ll also create a main function that will gather and run the hello_world function twice.

import asyncio

async def hello_world():
	print("Hello")
	await asyncio.sleep(3)
	print("World")

async def main():
	await asyncio.gather(hello_world(), hello_world())
	asyncio.run(main())

Let’s think about the behavior of these coroutines. What do we expect to happen? First, the first hello_world function has control and it prints “Hello”. Next, it reaches the await keyword, so it gives up control of the thread until the `await`ed function finishes. Once the first hello_world function gives up control, the second hello_world function executes.

It does the same thing, it prints “Hello” then yields control of the thread. After the first hello_world finishes waiting, it prints “World” and yields control again. Finally, the second hello_world function prints “World” and the program ends. We should see the prints “Hello”, “Hello”, “World”, “World” as shown in the image below.

How We Can Use Coroutines to Run Tasks Concurrently in the Cloud

In the example above we used coroutines to run concurrent tasks within the same process. However, we can also use coroutines to run functions or tasks in a distributed setting. For example, if you wanted to make multiple API calls, or had to wait for data to load from a database or blob storage, you could use coroutines to execute these tasks in the background, without blocking execution.

At Jetpack, our SDK runs ordinary Python functions as distributed cloud functions (which we call “jetroutines”) using Python’s asyncio library. By running these functions as jetroutines, developers can launch and suspend the execution of their cloud functions in the background, and then retrieve the result from our runtime when they need it. Developers can also spawn multiple jetroutines concurrently using ayncio.gather() from the standard library.

@jetroutine
async def background_job(msg: str) -> str:
    return msg.capitalize()


@jetroutine
async def background_job_two(msg: str) -> str:
    return msg.swapcase()
    
async def launch_job(msg: str) -> Dict[str, Any]:
    result = await asyncio.gather(
                    background_job(msg), 
                    background_job_two(msg))
    return {"message1": result[0], "message2": result[1]}

Summary of Python Coroutines

The first thing we learned in this post about coroutines is that they are special functions used to achieve concurrency via cooperative multitasking. Coroutines are well mannered functions that yield control when they’re waiting for external resources. They arose from a need to optimize the execution of multiple tasks.

Next, we learned the initial solutions to the parallel execution of multiple tasks were through parallelism using processes or threads. However, neither of these solutions provides ideal efficiency. This is where coroutines come in. They are more efficient because they yield control at specific points.

Then we learned about when to use coroutines. Coroutines are useful whenever we have functions that consist of both pure processing needs and the need to wait for external resources. Finally, we went over an example of how coroutines work using a “Hello World” example that simulates waiting for external resources.