Use semaphores to limit processing

How we use semaphores to prevent OOMKill in a memory hungry microservice

Use semaphores to limit processing
Photo by Noah Dominic / Unsplash

Two months ago, I was busy building a system to automatically upload files sent via our company messaging system to a malware detection API. The system has to respond within 10 seconds, otherwise the file is automatically distributed to recipients. Everything was fine during testing. Just after the rollout however, we started to see our Go microservice fall apart and restart again and again.

📊 Death by Powerpoint

During testing I only worked with small files because the upload limit on the test messaging instance was set to 5MB and because it was faster. However, on the production system the threshold is much higher: 50MB per file. My poor k8s pods and their 20MB RAM limit were quickly overwhelmed by users sharing large Powerpoint files.

So, first: increase the RAM limit and scale up the deployment to 8 replicas instead of 2. That would give me some time to handle the next issue.

🐜 What's next?

Next was the fact that a single file upload was limited to 50MB but users could upload several files at once in a single message.

One solution could then be to split the file handling in two parts: for each received message, loop through all files and send each to a queue system to be handled by subscribers. But I was happy with my single, no-dependency binary and I didn't want to introduce complexity unless absolutely necessary.

Also, since the limiting factor in this case was the file size and not the number of files, I wanted to find a solution that would work for either one large file or many small files in parallel.

The idea then was to introduce a semaphore.

🚦 What's that?

A semaphore works a bit like a bag of tokens. If you want to access a section restricted by the semaphore, you request some tokens. If there are enough tokens in the bag, you're allowed to continue. Otherwise, you either wait until there are, or you error out (depending on your use case).

In our case, we defined the maximum RAM a pod should have as 200MB and we configured the semaphore size to a bit less than 200x1024x1024x8 (the number of bytes in 200MB). Then, before downloading a file, we send a HEAD request to retrieve the file size in bytes without downloading it. We request that many tokens to the semaphore with a timeout of 5 seconds. If the timeout is reached, we reject the file so it's not distributed. Otherwise, we continue down our happy path.

🧑🏻‍💻 How does that look like?

Go has semaphores defined in the "extended" stdlib at https://pkg.go.dev/golang.org/x/sync/semaphore

package main

import (
	"context"
	"log/slog"
	"time"

	"golang.org/x/sync/semaphore"
)

func main() {
	const maxBytes = 200 * 1024 * 1024 * 8
	sem := semaphore.NewWeighted(maxBytes)

	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()

	fileSize := int64(12 * 1024 * 8) // retrieved from HEAD request

	if err := sem.Acquire(ctx, fileSize); err != nil {
		// err is ctx.Err()
		slog.Error(err.Error())
		return
	}

	defer sem.Release(fileSize)

	// continue processing

	slog.Info("done")
}

Playground: https://go.dev/play/p/rVUK3ftTAPN

📈 And the results?

Not single OOMKill since this update was deployed! Also, only two files in the last 30 days were rejected because of the semaphore timeout out of more than 90k files processed, or 0.002%. Not bad!