Go dependencies and binary size

How to prevent your app binary from getting bloated by unused dependencies? How to design your library so it doesn't happen to your users?

Go dependencies and binary size
Photo by Shavonne Yu / Unsplash

πŸ€” The issue

As an app developer, how to check what's included in your binary so it's not bloated by unused dependencies?

As a library developer, how to design your library so users won't see their binary size explode?

To answer these questions, we need to understand how the Go toolchain compiles your app.

πŸ’‘
This post doesn't address other factors that may impact binary size, like compilation flags for example.

🌌 Hypothesis

πŸ‘¨β€πŸ’»
I'm not an expert in compilers or the Go toolchain, so the vocabulary used in this post may be incorrect. Please contact me if you find mistakes!

Let's take for example a simple app, with only one external dependency:

flowchart LR subgraph mod1 [module example.com/mod1] direction LR subgraph pkga [pkg a] f1["func1 (uses gin)"] s1[struct] i1[interface] end subgraph pkgb [pkg b] f2[func2] f3[func3] end end subgraph app [module example.com/app] direction LR subgraph main [pkg main] fm["func main (uses func2)"] ... end end mod1 --> dep3[encoding/json] mod1 --> dep2[github.com/gin-gonic/gin] mod1 --> fmt app --> mod1

In this scenario, github.com/gin-gonic/gin is listed as a direct dependency in the mod1's go.mod.

We can postulate that since we depend on mod1, we'll embed gin in our binary as well. At least that's what I imagined, coming from Python. And this wouldn't be good news since we only need func2 from package b, which doesn't need gin.

πŸ‘¨πŸ»β€πŸ”¬ Experimenting

Test repo

In order to test this hypothesis, I threw together a small experiment in the form of a repository containing the app described in the graph above, plus a few variants.

GitHub - Crocmagnon/go-binary-size
Contribute to Crocmagnon/go-binary-size development by creating an account on GitHub.

It contains two directories:

  • app: a Go module intended as an application. It contains 4 packages compilable as standalone binaries:
    • a, a2 and b depend exclusively on matching packages from mod1
    • control depends only on stdlib's fmt and is used as a control for what a small Go binary should weigh
  • mod1: a Go module intended to represent a library that app includes. It contains 3 packages:
    • a: depends on a large external dependency (github.com/gin-gonic/gin) in its non-test file and also depends on some other external dependencies exclusively used in the test files, whether in the same package or in the a_test package.
    • a2: same as a but doesn't contain the test files
    • b: depends only on fmt

Observations

When building the binaries from inside app, we observe the following:

$ make build
mkdir -p ./dist
GOARCH=amd64 GOOS=darwin go build -o ./dist/a ./a
GOARCH=amd64 GOOS=darwin go build -o ./dist/a2 ./a2
GOARCH=amd64 GOOS=darwin go build -o ./dist/b ./b
GOARCH=amd64 GOOS=darwin go build -o ./dist/control ./control

# on linux: stat --printf="%s %n\n" ./dist/*
$ stat -f "%z %N" ./dist/*
7261232 ./dist/a
7261232 ./dist/a2
2030336 ./dist/b
2030336 ./dist/control

$ shasum -a 256 ./dist/*
53b15316f40d69af54fb18ac9ec427b40c52fe9e7c852e2ebc3931f90ae851cb  ./dist/a
e5251a45a82b28cf8bc2276bf82a6ca2cf29cd05e274b0e07d87c401ba74322d  ./dist/a2
3feac94eec4c560a656aab6c9b3d07378f843189a9a9273c7a5351695d53bf9e  ./dist/b
7bf645706dc55aede94ec9a373dbca7f4a06383fdea723b7237f5b343ee28b19  ./dist/control
  • The sizes of a and a2 are equal, therefore the dependencies used only in the test files in a aren't built in the final binary.
  • The sizes of a and b are different. b is smaller, as expected. Therefore the dependencies in a and a2 aren't built in b despite being listed in mod1's and app's go.mod.
  • The sizes of b and control are equal, which is interesting because the binaries don't produce the same results. They are different, as shown by the hashes, so the compiler probably does some good job flattening dependency trees and compacting source.

The depth package can also help us visualize the dependency as a tree.

$ go install github.com/KyleBanks/depth/cmd/depth@latest
go: downloading github.com/KyleBanks/depth v1.2.1

$ make deptree
mkdir -p ./deptree
depth ./a > ./deptree/a
depth ./a2 > ./deptree/a2
depth ./b > ./deptree/b
depth ./control > ./deptree/control

$ tail -1 ./deptree/a
113 dependencies (58 internal, 55 external, 0 testing).

$ diff ./deptree/{a,a2}
1,2c1,2
< ./a
<   β”” example.com/mod1/a
---
> ./a2
>   β”” example.com/mod1/a2

$ grep -rn "tonic" ./deptree/a

$ command cat ./deptree/b
./b
  β”” example.com/mod1/b
    β”” fmt
2 dependencies (1 internal, 1 external, 0 testing).

$ command cat ./deptree/control
./control
  β”” fmt
1 dependencies (1 internal, 0 external, 0 testing).

I'm not including the full dependency tree for a and a2 because they're quite large. What's interesting to note is that they're identical, the only difference being the name of the package imported as root. There's also no trace of github.com/loopfz/gadgeto in the dependency tree of a, which is what we expected.

Conclusions

Based on these observations, my conclusion is that the Go toolchain considers packages as the compilation unit. This means that since our example app only depends on package b from mod1, package a and its dependencies won't be included. The go.mod file of our app will list example.com/mod1 as a direct dependency, but won't include github.com/gin-gonic/gin.

The real dependency graph looks more like this:

flowchart LR subgraph mod1 [module example.com/mod1] direction LR subgraph pkga [pkg a] f1["func1 (uses gin)"] s1[struct] i1[interface] end subgraph pkgb [pkg b] f2[func2] f3[func3] end end subgraph app [module example.com/app] direction LR subgraph main [pkg main] fm["func main (uses func2)"] ... end end pkga --> fmt pkga --> dep2[github.com/gin-gonic/gin] pkga --> dep3[encoding/json] main --> pkgb pkgb --> fmt

🌯 Takeaways

As a library developer, move functions/structs/interfaces with large dependencies to a separate package when it makes sense. For example, if you're building a logging library and you want to offer a middleware for gin, extract it to its own package. This will allow users of your library to not depend on gin if they don't want to.

As an app developer, before adding a new dependency for a very small subset of its features, consider copying. In our previous example, if func2 is small enough and since it only depends on fmt, it should be easy to copy and maintain alongside your code rather than depending on an external package.