Context-induced performance bottleneck in Go

Misused go contexts can lead to severe performance issues.

A big Pokemon (Snorlax) happily sleeping in the grass under the sun.
Snorlax - The Pokemon Company

🏞️ Setting up the… context 🥁

The documentation for the context package states that they carry "deadlines, cancellation signals, and other request-scoped values across API boundaries and between processes."

At OVHcloud, we use a logging library which uses contexts to add structured metadata to log messages. One obvious example is adding a request ID to all logs emitted from the scope of a single request.

We noticed slowness in a message processing microservice and tracked it down to… the context! We were looping through the messages and on each iteration we set a new ID on the context to help group logs by message, like this:

func processMessages(ctx context.Context, messages chan Message) {
	for msg := range messages {
		ctx = context.WithValue(ctx, "msgID", newMsgID())
		// do something with the message, including logging 
	}
}

🕵🏻‍♂️ Understanding (and fixing) the issue

The issue boils down to the implementation of the context.WithValue and Context.Value in Go. context.WithValue wraps the parent context in a new one containing the desired value. So when "adding" two values to an empty context, you're in fact ending up with nested contexts:

flowchart TD subgraph "request ID = 123" subgraph "foo = bar" empty end end

In order to extract the value for the foo key, you can use ctx.Value("foo"). It will traverse all the nested contexts, checking if the current one holds a value for the requested key. When it finds one, it stops and returns the value.

Now what happens when processing lots of messages, like in the example above? Well… something like this:

flowchart TD subgraph "request ID = random" subgraph "request ID = random" subgraph "request ID = random" subgraph "request ID = random" subgraph "request ID = random" subgraph "..." subgraph "key = val" empty end end end end end end end

For each loop iteration, a new context is created to wrap the previous ones because we didn't shadow the variable in the loop.

The fix is very easy: use := instead of = to shadow the outer context:

func processMessages(ctx context.Context, messages chan Message) {
	for msg := range bus.Messages() {
		// notice the `:=` instead of `=` below
		ctx := context.WithValue(ctx, "msgID", newMsgID())
		// ...
	}
}
💡
I wrote a linter to detect this issue. I also opened a PR to include this linter in golangci-lint. I intend to write a followup article about this process.

📈 Measurements

The performance delta is staggering. What follows is a snippet that outputs the time each loop iteration takes to retrieve the value from either a shadowed context, or a "fat" context.

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/gofrs/uuid"
)

const key = "key"

func main() {
	const times = 10_000

	// Setup the value we want to retrieve in each iteration
	ctx := context.WithValue(context.Background(), key, "some-val")

	fat(ctx, times)
	shadow(ctx, times)
}

func fat(ctx context.Context, times uint64) {
	for range times {
		// wrap the context, each iteration makes it bigger
		ctx = contextWithRandom(ctx)

		start := time.Now()
		// simulate the logging lib which retrieves context values
		_ = ctx.Value(key)

		fmt.Printf("fat,%v\n", time.Since(start).Nanoseconds())
	}
}

func shadow(ctx context.Context, times uint64) {
	for range times {
		// shadow the context, each iteration creates a new one and it doesn't grow
		ctx := contextWithRandom(ctx)

		start := time.Now()
		_ = ctx.Value(key)

		fmt.Printf("shadow,%v\n", time.Since(start).Nanoseconds())
	}
}

func contextWithRandom(ctx context.Context) context.Context {
	return context.WithValue(ctx, "other_key", uuid.Must(uuid.NewV4()))
}

Here's a graph plotting the data output by the previous program. After just 2400 iterations, the ctx.Value call in the fat version already takes 10 µs per value. If your logging library retrieves 10 values, that's 0.1 ms spent just retrieving values to emit a single log line. And since it grows proportionally to the number of wrapped contexts, the slowdown increases over time, making the service sluggish after just a few hours.

The graph shows a near constant line at the bottom, near the 0 of the y axis for the "shadow" implementation, and a line growing proportionally to the loop count for the "fat" implementation.
Time taken to retrieve a single context value after N loop iterations

I don't see any case where we'd want to use a context value after a loop iteration – and in fact in most cases we don't want this, because of the logging library which would then leak some values after the loop – so as a rule of thumb I now always shadow my contexts inside loops.

Here's a benchmark showing similar results in a more textual way.

package main_test

import (
	"context"
	"fmt"
	"math/rand/v2"
	"testing"
)

const key = "key"

func BenchmarkContext(b *testing.B) {
	benchmarks := []struct {
		times uint64
	}{
		{1_000},
		{10_000},
		{100_000},
	}
	for _, bm := range benchmarks {
		ctx := context.WithValue(context.Background(), key, "some value")
		b.Run(fmt.Sprintf("shadow %v", bm.times), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				shadow(ctx, bm.times)
			}
		})
		b.Run(fmt.Sprintf("fat %v", bm.times), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				fat(ctx, bm.times)
			}
		})
	}
}

func shadow(ctx context.Context, times uint64) {
	for range times {
		ctx := contextWithRandom(ctx)
		_ = ctx.Value(key)
	}
}

func fat(ctx context.Context, times uint64) {
	for range times {
		ctx = contextWithRandom(ctx)
		_ = ctx.Value(key)
	}
}

func contextWithRandom(ctx context.Context) context.Context {
	return context.WithValue(ctx, "other_key", rand.Int64())
}

The results also show the discrepancy in growth rates between the shadow and fat implementations.

$ go test -bench=.
goos: darwin
goarch: arm64
pkg: github.com/Crocmagnon/fat-contexts
BenchmarkContext/shadow_10-8         	 2261456	       763.0 ns/op
BenchmarkContext/fat_10-8            	 1662235	       743.6 ns/op
BenchmarkContext/shadow_100-8        	  236544	      4888 ns/op
BenchmarkContext/fat_100-8           	   53778	     22275 ns/op
BenchmarkContext/shadow_1000-8       	   24499	     48474 ns/op
BenchmarkContext/fat_1000-8          	     711	   1698109 ns/op
BenchmarkContext/shadow_10000-8      	    2472	    489804 ns/op
BenchmarkContext/fat_10000-8         	       6	 170819118 ns/op
BenchmarkContext/shadow_100000-8     	     248	   4938549 ns/op
BenchmarkContext/fat_100000-8        	       1	17150788208 ns/op
PASS
ok  	github.com/Crocmagnon/fat-contexts	31.454s

To learn more about Go benchmarks output format, head over to the documentation. There are a couple of interesting things to note here:

  • the fat implementation is slower even at small sample sizes ;
  • when multiplying the loop count by 10, the shadow implementation is 10 times slower as expected but the fat one is 100 (=10²) times slower.

This last point can be explained by the fact that retrieving the value is a constant-time operation (O(1)) in the shadow variant and proportional to the loop size (O(n)) in the fat variant, which is what we observed in the chart above.

If you want to reproduce my results, the code is available on GitHub:

GitHub - Crocmagnon/fat-contexts-article-companion: On context-induced performance bottleneck in Go
On context-induced performance bottleneck in Go. Contribute to Crocmagnon/fat-contexts-article-companion development by creating an account on GitHub.

Thanks for reading!

💡
This article has been featured in the March 29th Cup o' Go episode. Thank you Shay & Jonathan!