Horizontally Scaling Your Service with Kubernetes Pods

jetpack.io

Jun 24, 2022 • 8 min read

Your app is starting to pick up and get traffic! When you first launch, traffic is usually slow and it’s totally fine to have everything on one machine at the beginning. As you grow and more people hear about your application, traffic starts to get spikier and you have to handle your growing traffic. How do you most efficiently set up your servers? How can you keep requests consistent across applications? What about data integrity?

We’ll answer all of those and more in this post on how to scale your service horizontally using Kubernetes Pods. In this post, we’ll cover:

What is Horizontal Scaling?
When Should You Scale Your App Horizontally?
Considerations in Horizontal Scaling with Kubernetes Pods
Dealing with Caching and the Data Layer when Horizontally Scaling with Kubernetes Pods
Statelessness of Kubernetes Pods
Ensuring Efficient Horizontal Scaling with Kubernetes Pods
Distributed Cache Example for Horizontal Scaling using Redis
Set Up the Distributed Cache and API Backend
Function to Download and Save Data to the Distributed Cache
Data Query Function
Full Code for the Distributed Cache Example
Summary of Horizontal Scaling with Kubernetes Pods

What is Horizontal Scaling?

Before we dive into how you can leverage Kubernetes Pods to scale your service horizontally, let’s talk about what horizontal scaling is.

Broadly speaking, you can scale your service or application horizontally or vertically. Vertical scaling adds more power to your current system. It involves either increasing the compute power (CPU) or memory (RAM) of your server(s). Horizontal scaling adds more instances to your system.

Both target increasing scalability, the number of requests your application and support simultaneously. Both horizontal and vertical scaling aim to scale your application by adding compute resources. So, how do you know when to scale horizontally or vertically?

When Should You Scale Your App Horizontally?

In many ways, scaling vertically is simpler than scaling horizontally. When scaling horizontally, you have to add machines. This means we have to consider how to deal with data consistency, and ensure consistency across multiple instances. What about the strengths of scaling horizontally though?

Two of the biggest benefits to scaling your service horizontally are increased flexibility and uptime. Adding more replicas makes it less likely that your application will have downtime. If one replica is down, it’s likely that the other(s) aren’t. Having multiple server instances ready to spin up or down also allows for spiky traffic. This is especially useful when your application is starting to get popular, or if usage increases at certain times of the day/week/month/year.

Any application that gets big enough will have to deal with horizontal scaling eventually, one server can only handle so much traffic. Let’s take a deeper dive into how we can handle the complexities of horizontal scaling.

Considerations in Horizontal Scaling with Kubernetes Pods

One tool you can use when scaling horizontally is Kubernetes Pods. Pods are the basic compute resource on Kubernetes, they are a group of containers using shared resources and storage. You can think of each Pod as its own server instance. Let’s take a look at some of the things we should consider when scaling horizontally using Kubernetes Pods.

Dealing with Caching and the Data Layer When Horizontally Scaling with Kubernetes Pods

Almost every service needs to store data locally, or cache data, at some point in the workflow. One of the challenges of scaling horizontally is ensuring consistency in the data layer. Each Kubernetes Pod has its own set of resources and storage. Having separate data storages can lead to inconsistent data saving and retrieval.

To get around the memory sharing between Kubernetes Pods problem, we can shift to a different memory solution. Instead of caching data locally on the Pod, we can employ the use of a distributed cache like Redis.

Using a distributed cache allows all replicas to save and read data from the same cache. This ensures that they all return the same results. Other than consistency, another advantage of using a distributed cache is that we only have to download the data once - in the cache instead of separately in each instance.

One of the downsides of this architecture is that it can result in slower access times to the cached data. We can minimize this slowdown by hosting our distributed cache within the same cluster and namespace as our service nodes. Internal network calls within a cluster are quicker than external network calls because nodes within clusters are logically/virtually closer.

Statelessness of Kubernetes Pods

Kubernetes Pods are “designed to be ephemeral, disposable entities”. They are designed for horizontally scaling a stateless system. The term stateless refers to not having any information about the past state of the system or any stored data. Statelessness is useful because it allows for consistency and it means we can disregard whether or not Pods crash.

Each newly started Kubernetes Pod that runs our service will have a new, empty filesystem. Since we want to access cached data, we need to offload the state of the system to a cache or database. This lets us keep our services Pods stateless and allows them to easily scale horizontally.

Naturally, this raises the question, how do we host the cache? We can’t use a Kubernetes Pod because of its ephemeral nature. Instead, we use a Kubernetes StatefulSet with a PersistentVolume. Using a PersistentVolume adds a volume that sits outside of the Pod that stays up even if the Pod crashes, ensuring that even if all the Pods go down, we can keep the cache on hand.

Distributed Cache Example for Horizontal Scaling using Redis

Let’s create an example to better understand how we can utilize distributed caching with Redis. In this example we’ll create a FastAPI backend that uses a Redis cache. Our example will get the top 1000 github repos for a given tag and cache them locally.

We will create two example functions, one function to download and cache the data, and another to query for the data. To follow along, you’ll need to install the fastapi, opendatasets, and redis libraries. You can do that with the line below in the terminal:

pip install fastapi opendatasets redis

Note that you’ll also have to set the REDIS_HOST, REDIS_PORT, and REDIS_PASSWORD environment variables for working with Redis.

Set Up Distributed Cache and API

The first thing we’ll do in our example is set up our distributed cache and API backend. We’ll start off by importing the libraries we need. Once we import all the necessary libraries, we create our FastAPI app, get all the Redis values from the environment and set up a Redis cache, and declare our constants.

import json
import time
from fastapi import FastAPI
import opendatasets as od
import os
from redis import StrictRedis

app = FastAPI()

# Connect to Redis
host = os.environ.get("REDIS_HOST")
port = os.environ.get("REDIS_PORT")
password = os.environ.get("REDIS_PASSWORD")
if host is None or port is None:
	raise ValueError("REDIS_HOST environment variable not set")
r = StrictRedis(host=host, port=int(port), password=password)

# List of valid tags the user can lookup
valid_tags = ['cpp', 'pytorch', 'vue-js', 'nextjs', 'golang', 'django', 'r', 'julia', 'kubernetes', 'react-js', 'flask', 'angular', 'scikit', 'scala', 'docker', 'oops', 'react-native','dart', 'machine-learning', 'nodejs', 'tensorflow',
'deep-learning', 'kotlin', 'gatsby']

#Variables for downloading the dataset
dirname="top-1000-github-repositories-for-multiple-domains"
DATA_URL="https://www.kaggle.com/anshulmehtakaggl/top-1000-github-repositories-for-multiple-domains/"

Download and Save to Distributed Cache Function

Now that we’ve set everything up, we’ll create our caching function. This function doesn’t require any parameters. It pings the data URL we saved earlier, creates and modifies a Redis pipeline, and then uses that pipeline to cache the data.

# Function to download and cache the dataset if it’s not available
def update_cache_from_kaggle():
   # Use OD to download the file to a local directory
   od.download(DATA_URL)
   pipe = r.pipeline()
  
   # Cache all the files in Redis, using the filename as the key
   for file in os.listdir(dirname):
       with open(os.path.join(dirname, file), "r") as f:
           try:
               data = json.load(f)
           except:
               print(f"Couldn't read file {file}")
               continue
       key = file.split(".")[0].lower()
       pipe.set(key, json.dumps(data), ex=3600)
   pipe.execute()

Function to Check for Data in the Distributed Cache or Download it

Finally, we come to the function to actually query the service. This function requires one parameter, the tag name. It starts by checking that we have a valid tag. If our tag is valid then we either retrieve it directly from our distributed cache or we call the update function we created above and then retrieve it from Redis.

# This route returns github data for the top 1000 github repos for a given tag
@app.get("/github_stats/{tag}")
async def get_tag(tag: str):
   if tag not in valid_tags:
       return "Tag not valid", 404
  
   result = ""
 
   if r.exists(tag):
       result= r.get(tag)
   else:
       update_cache_from_kaggle()
       result = r.get(tag)
 
   end = time.perf_counter()
   return result

Full Code for Distributed Cache Example

Here’s the full code for our example of using a distributed cache with a FastAPI backend.

import json
import time
from fastapi import FastAPI
import opendatasets as od
import os
from redis import StrictRedis
 
app = FastAPI()
 
# Connect to Redis
host = os.environ.get("REDIS_HOST")
port = os.environ.get("REDIS_PORT")
password = os.environ.get("REDIS_PASSWORD")
if host is None or port is None:
   raise ValueError("REDIS_HOST environment variable not set")
r = StrictRedis(host=host, port=int(port), password=password)
 
# List of valid tags the user can lookup
valid_tags = ['cpp', 'pytorch', 'vue-js', 'nextjs', 'golang', 'django', 'r', 'julia', 'kubernetes', 'react-js', 'flask', 'angular', 'scikit', 'scala', 'docker', 'oops', 'react-native','dart', 'machine-learning', 'nodejs', 'tensorflow',
'deep-learning', 'kotlin', 'gatsby']
 
#Variables for downloading the dataset
dirname="top-1000-github-repositories-for-multiple-domains"
DATA_URL="https://www.kaggle.com/anshulmehtakaggl/top-1000-github-repositories-for-multiple-domains/"
 
# This route returns github data for the top 1000 github repos for a given tag
@app.get("/github_stats/{tag}")
async def get_tag(tag: str):
   if tag not in valid_tags:
       return "Tag not valid", 404
  
   result = ""
 
   if r.exists(tag):
       result= r.get(tag)
   else:
       update_cache_from_kaggle()
       result = r.get(tag)
 
   end = time.perf_counter()
   return result
 
# Function to download and cache the dataset if it’s not available
def update_cache_from_kaggle():
   # Use OD to download the file to a local directory
   od.download(DATA_URL)
   pipe = r.pipeline()
  
   # Cache all the files in Redis, using the filename as the key
   for file in os.listdir(dirname):
       with open(os.path.join(dirname, file), "r") as f:
           try:
               data = json.load(f)
           except:
               print(f"Couldn't read file {file}")
               continue
       key = file.split(".")[0].lower()
       pipe.set(key, json.dumps(data), ex=3600)
   pipe.execute()

Expected Behavior of the Distributed Cache Example

The example should download the GitHub repos when the tags for the first time and cache them on Redis. A call that hits any other instance of the service will then be able to read those repos from the cache instead of having to download them. The shared cache also ensures that the service will return the same result regardless of which replica processes the request. The end result of this is that no matter how much traffic we get, we can continue to horizontally scale our service by using stateless Pods.

Summary of Horizontal Scaling with Kubernetes Pods

In this post, we went over horizontal scaling and how to do it using Kubernetes Pods. Horizontal scaling is the process of scaling your service by adding more compute instances to your resource pool.

We also discussed some of the challenges you could face while scaling horizontally. Most notably, we discussed dealing with the challenge of data consistency. We needed a solution that not only keeps data consistent between instances, but also handles the statelessness of Kubernetes Pods.

With these considerations and challenges in mind, we approached the problem by implementing a distributed cache like Redis. To wrap things up, we went over a simple example on how to set up Redis with a FastAPI backend.