Avoid Using GitHub Docker Caching on Self-Hosted Runners

4 months ago 2

2025-01-11, 7 minutes reading for Software Engineers

TL;DR

Just don’t use it, because of these problems:

GitHub storage network bandwidth is very slow, ours was limited to ~32 MB/s when running from an AWS Self-Hosted Runners., because of this it takes about one minute to upload 1GB of cache, and another minute to download said cache on your next run.
If your dependencies are not cached on the runner, you will be getting a different build each time, meaning that the cache is invalidated for the “install dependencies” layer. Dependency installation variations lead to cache invalidation, which will also invalidate all subsequent layers

Docker custom caching is a feature that allows you to cache the layers of your Docker images to a custom location. This can be useful to speed up the build process of your Docker images.

Prerequisites

Solid Understanding of:

Docker
GitHub Actions

Docker Cache Modes

Docker offers two caching modes max and min:

In min cache mode (the default), only layers that are exported into the resulting image are cached, while in max cache mode, all layers are cached, even those of intermediate steps.

While min cache is typically smaller (which speeds up import/export times, and reduces storage costs), max cache is more likely to get more cache hits. Depending on the complexity and location of your build, you should experiment with both parameters to find the results that work best for you.

In this post, I'm using the max mode.

Implementation Requirements

There are 2 ways to create this:

With bake-action,
Manually caching and restoring using cache action

Due to environment limitation, I to choose the manual option.

The application that I wanted to cache consisted of multiple images that need to be built and tested

Docker image setup

If you want to cache every image, this can be done with cache_to and cache_from options, this command should be invoked like so in your CI:

docker buildx bake -f docker-compose.yml -f docker-compose.dev.yml --load --set *.cache_from=type=local,src=./tmp/.buildx-cache --set *.cache_to=type=local,dest=./tmp/.buildx-cache

However, if you want control over which services to cache, then include these in your docker-compose file:

cache_from: - type=local,src=./tmp/.buildx-cache-restored cache_to: - type=local,dest=./tmp/.buildx-cache,modes=max View full Dockerfile services: frontend: build: target: builder cache_from: - type=local,src=./tmp/.buildx-cache-restored cache_to: - type=local,dest=./tmp/.buildx-cache,modes=max command: npm run dev ... api: build: args: - NODE_ENV=development target: builder cache_from: - type=local,src=./tmp/.buildx-cache-restored cache_to: - type=local,dest=./tmp/.buildx-cache,modes=max ...

Workflow file

Required GitHub Action setup

Setup the Docker buildx action docker/setup-buildx-action@v3
Restore the cache using actions/cache/restore@v4
Rename the cache directory if cache hit
Create directories for cache if no cache hit, because docker complains if the directory does not exist
Build the images using docker buildx bake -f docker-compose.yml --load --provenance=false
Save the cache using actions/cache/save@v4
Clean-up the cache directories

View full workflow file name: Docker Caching on: pull_request: jobs: tests: runs-on: self-hosted # your runner name here steps: - name: Checkout uses: actions/checkout@v4 - uses: docker/setup-buildx-action@v3 - name: Set up Docker build cache id: cache-docker-layer uses: actions/cache/restore@v4 with: path: ./tmp/.buildx-cache key: ${{ runner.os }}-docker-${{ github.sha }} restore-keys: | ${{ runner.os }}-docker- - name: move cache to buildx-cache-restored if: steps.cache-docker-layer.outputs.cache-hit == 'true' run: mv ./tmp/.buildx-cache ./tmp/.buildx-cache-restored || true - name: ls ./tmp/.buildx-cache-restored if: steps.cache-docker-layer.outputs.cache-hit == 'true' run: ls ./tmp/.buildx-cache-restored || true - name: create dirs for cache if no cache hit if: steps.cache-docker-layer.outputs.cache-hit == 'false' run: | mkdir -p ./tmp/.buildx-cache-restored || true chmod -R 777 ./tmp/.buildx-cache-restored || true mkdir -p ./tmp/.buildx-cache || true chmod -R 777 ./tmp/.buildx-cache || true - name: Build images # using multiple -f docker-compose to add and overwrote the base docker-compose run: docker buildx bake -f docker-compose.base.yml -f docker-compose.e2e.yml -f docker-compose.ci.yml --load --provenance=false - name: Run tests here run: echo "Running tests..." - name: ls ./tmp/.buildx-cache run: ls ./tmp/.buildx-cache - name: Cache Docker layers if: always() uses: actions/cache/save@v4 with: path: ./tmp/.buildx-cache key: ${{ runner.os }}-docker-${{ github.sha }} - name: Cleanup containers if: always() run: | rm -rf ./tmp/.buildx-cache || true rm -rf ./tmp/.buildx-cache-restored || true

Network Speed Analysis

The transfer speed on cache hit for ~1GB...

As you can see from the image above, the Network Performance:

GitHub-hosted averaged runners: ~120 MB/s (1 Gbps)
Self-hosted runners: ~32 MB/s.

Impact: Cache hits on self-hosted runners have a transfer speed that is 1/4 that of GitHub runners.

Build Time Analysis

Original without docker caching: 7 mins
With docker caching:
- On first run: 11-12 mins
- Subsequent runs:
  - Expected: 4-5min (if application dependencies were cached)
  - Reality:
    - Worst case: 11-12 mins (Dependency installation variations)
    - Best case: ~5mins (when the stars align)

Despite only caching ~1GB worth of layers on GitHub, the runtime was hit and miss(pun intended).

I suspect this is because the application dependencies(such as react) are not cached on the runner, each build will download a different variation of dependencies(there is no guarantee that react 17.1.2 that was downloaded 3 hours ago is the same as react 17.1.2 one downloaded today unless there is a hash), meaning that the cache is invalidated for the “npm install” layer, this will also invalidate all subsequent layers.

However, Seeing my local builds were always faster, I decided to reuse a runner instead of letting it be get recycled, this way I would be depending on docker default caching, doing so decreased run time by more than 50% to only 2-3 mins.

Given that this job is trigger many times a day, the mean time will approach 3 mins without any manual caching, which is good enough. However runner reuse comes with security risk, which I ultimately abandoned.

I haven't looked into why default docker caching produced more consistent builds compared to explicit caching, i.e. what guarantees are made that the install dependencies step gets reused. This would be interesting to investigate in the future.

FIN

Just don't use it. I offer two alternative:

Runner reuse(warm cache) is more effective than relying on GitHub caching for this use case, if your security team allows it.
If you still want to do caching on Self-Hosted runners, you should look into GitHub Actions Cache Server

Read Entire Article

Avoid Using GitHub Docker Caching on Self-Hosted Runners

TL;DR

Prerequisites

Docker Cache Modes

Implementation Requirements

Docker image setup

Workflow file

Network Speed Analysis

Build Time Analysis

FIN

Related

Been working from home for a long time

'Rum' Rebellion

Elon Musk celebrates his $1T pay package with dancing robots...