Just before I open sourced revect, I had a problem. Well, two problems actually.
I wanted to make sure if people used a different embedding model later on, the system still worked. With vector embeddings, you turn text into vectors, and if you rely on Open AI’s models… when they deprecate it, you’re screwed… Unless you re-embed everything with a new model.
If you are not familiar, an embedding model converts words into vectors, basically points on a chart or map. This makes it possible to search by meaning of words. For example, search the word “government” and find everything related to presidents even though the text doesn't exactly match.
So I did that. If the model changes, the system knows and re-embeds it all automatically.
The next problem, I wanted to make sure, even if a mistake is merged in to main, we do not push out a new docker image. For example, a mistake with a migration that would break older users databases.
You might imagine referencing some new database field, without renaming the original database columns. This would cause anyone who uses “pull always” on docker, to receive the new code, and then it immediately breaks, because their database schema is different.
So I had a simple solution, and I let claude code implement it for me 😉
You can see the entire action here. Basically:
const currentReleaseTag = context.payload.release.tag_name; const currentIndex = releases.data.findIndex(r => r.tag_name === currentReleaseTag); // Get previous release if it exists if (currentIndex >= 0 && releases.data.length > currentIndex + 1) { const prevRelease = releases.data[currentIndex + 1]; console.log(`Previous release: ${prevRelease.tag_name}`); return prevRelease.tag_name; } else { console.log('No previous release found'); return ''; }Grab the previously released git tag, every time we make a new docker image. I use GitHub's releases section to do deploys, so I always make my release tags incremental.
It doesn't really matter how you make them as long as there is one prior release to the current build when you add this.
# Create directory for database mkdir -p ./data # Pull previous image PREV_TAG=${{ steps.prev_release.outputs.result }} docker pull ${{ env.DOCKER_IMAGE }}:${PREV_TAG} # Run migrations on previous version echo "Running migrations on previous version ${PREV_TAG}" docker run --name prev-revect -v $(pwd)/data:/app/data ${{ env.DOCKER_IMAGE }}:${PREV_TAG} bun run /app/src/database/migrations.ts up # Build current version for testing (only amd64 for testing) docker build -t ${{ env.DOCKER_IMAGE }}:test . # Test migrations with the new version echo "Testing migrations on new version" docker run --name new-revect -v $(pwd)/data:/app/data ${{ env.DOCKER_IMAGE }}:test bun run /app/src/database/migrations.ts upOkay, so here's a breakdown of what we just did above:
We pull the LAST release image.
Run migrations on it, these are the migrations from that prior release. This is done in the volume mount, /app/data. Revect initializes a sqlite file here if one doesn’t exist yet.
Now we build the new image, and we run the migrations on that database from the prior version, because we mount the same volume.
If it fails, the release stops, and we don’t push out anything broken to current users!
I thought it was pretty neat and intuitive.
Okay, that's cool and all, but how might we do this with external databases that aren't SQLite?
I'm not going to go too in depth, but you can use all of the same paradigms I shared above.
The only difference would be that you need to do a docker run for Postgres inside your GitHub actions or any other external database service, and then run your app container and make sure that it has the correct environment variables to connect.
Essentially, the only change is instead of a volume mount for SQLite, you're just starting Postgres and then injecting the Postgres connection URL to the previous container.
After that's done, now you have your base migrations that have already been applied. Now you inject the same connection URL to the new container and run the migrations, exactly like we did using SQLite. The difference is, you now have a second container running inside of GitHub Actions (your database server).
I think this is worth doing, especially for releases of open source projects, even internal projects if they are consumed by many internal teams.
If you found this interesting, please give it a like or comment. I haven't seen this discussion very often when it comes to Docker or content around safety nets for migration issues.