Go dependencies and binary size
How to prevent your app binary from getting bloated by unused dependencies? How to design your library so it doesn't happen to your users?
🤔 The issue
As an app developer, how to check what's included in your binary so it's not bloated by unused dependencies?
As a library developer, how to design your library so users won't see their binary size explode?
To answer these questions, we need to understand how the Go toolchain compiles your app.
🌌 Hypothesis
Let's take for example a simple app, with only one external dependency:
In this scenario, github.com/gin-gonic/gin
is listed as a direct dependency in the mod1
's go.mod
.
We can postulate that since we depend on mod1
, we'll embed gin
in our binary as well. At least that's what I imagined, coming from Python. And this wouldn't be good news since we only need func2
from package b
, which doesn't need gin
.
👨🏻🔬 Experimenting
Test repo
In order to test this hypothesis, I threw together a small experiment in the form of a repository containing the app described in the graph above, plus a few variants.
It contains two directories:
app
: a Go module intended as an application. It contains 4 packages compilable as standalone binaries:a
,a2
andb
depend exclusively on matching packages frommod1
control
depends only on stdlib'sfmt
and is used as a control for what a small Go binary should weigh
mod1
: a Go module intended to represent a library thatapp
includes. It contains 3 packages:a
: depends on a large external dependency (github.com/gin-gonic/gin
) in its non-test file and also depends on some other external dependencies exclusively used in the test files, whether in the same package or in thea_test
package.a2
: same asa
but doesn't contain the test filesb
: depends only onfmt
Observations
When building the binaries from inside app
, we observe the following:
$ make build
mkdir -p ./dist
GOARCH=amd64 GOOS=darwin go build -o ./dist/a ./a
GOARCH=amd64 GOOS=darwin go build -o ./dist/a2 ./a2
GOARCH=amd64 GOOS=darwin go build -o ./dist/b ./b
GOARCH=amd64 GOOS=darwin go build -o ./dist/control ./control
# on linux: stat --printf="%s %n\n" ./dist/*
$ stat -f "%z %N" ./dist/*
7261232 ./dist/a
7261232 ./dist/a2
2030336 ./dist/b
2030336 ./dist/control
$ shasum -a 256 ./dist/*
53b15316f40d69af54fb18ac9ec427b40c52fe9e7c852e2ebc3931f90ae851cb ./dist/a
e5251a45a82b28cf8bc2276bf82a6ca2cf29cd05e274b0e07d87c401ba74322d ./dist/a2
3feac94eec4c560a656aab6c9b3d07378f843189a9a9273c7a5351695d53bf9e ./dist/b
7bf645706dc55aede94ec9a373dbca7f4a06383fdea723b7237f5b343ee28b19 ./dist/control
- The sizes of
a
anda2
are equal, therefore the dependencies used only in the test files ina
aren't built in the final binary. - The sizes of
a
andb
are different.b
is smaller, as expected. Therefore the dependencies ina
anda2
aren't built inb
despite being listed inmod1
's andapp
'sgo.mod
. - The sizes of
b
andcontrol
are equal, which is interesting because the binaries don't produce the same results. They are different, as shown by the hashes, so the compiler probably does some good job flattening dependency trees and compacting source.
The depth
package can also help us visualize the dependency as a tree.
$ go install github.com/KyleBanks/depth/cmd/depth@latest
go: downloading github.com/KyleBanks/depth v1.2.1
$ make deptree
mkdir -p ./deptree
depth ./a > ./deptree/a
depth ./a2 > ./deptree/a2
depth ./b > ./deptree/b
depth ./control > ./deptree/control
$ tail -1 ./deptree/a
113 dependencies (58 internal, 55 external, 0 testing).
$ diff ./deptree/{a,a2}
1,2c1,2
< ./a
< └ example.com/mod1/a
---
> ./a2
> └ example.com/mod1/a2
$ grep -rn "tonic" ./deptree/a
$ command cat ./deptree/b
./b
└ example.com/mod1/b
└ fmt
2 dependencies (1 internal, 1 external, 0 testing).
$ command cat ./deptree/control
./control
└ fmt
1 dependencies (1 internal, 0 external, 0 testing).
I'm not including the full dependency tree for a
and a2
because they're quite large. What's interesting to note is that they're identical, the only difference being the name of the package imported as root. There's also no trace of github.com/loopfz/gadgeto
in the dependency tree of a
, which is what we expected.
Conclusions
Based on these observations, my conclusion is that the Go toolchain considers packages as the compilation unit. This means that since our example app only depends on package b
from mod1
, package a
and its dependencies won't be included. The go.mod
file of our app will list example.com/mod1
as a direct dependency, but won't include github.com/gin-gonic/gin
.
The real dependency graph looks more like this:
🌯 Takeaways
As a library developer, move functions/structs/interfaces with large dependencies to a separate package when it makes sense. For example, if you're building a logging library and you want to offer a middleware for gin
, extract it to its own package. This will allow users of your library to not depend on gin
if they don't want to.
As an app developer, before adding a new dependency for a very small subset of its features, consider copying. In our previous example, if func2
is small enough and since it only depends on fmt
, it should be easy to copy and maintain alongside your code rather than depending on an external package.