Migrating over 30 lambdas from Serverless Framework with LLMs

3 months ago 3

I often find people surprised when I say that my “offline” desktop application, written in Qt/C++, has a “backend”. What started with two or three lambda functions issuing licenses and sending payment confirmations, over the years grew into more than 30 lambda functions: various webhooks, crons (license expiry reminders, customer development etc.), proxies, helpdesk, AI inference and so much more. From day one I decided to go with serverless approach, because the majority of the functionality was in the desktop app itself, and I didn’t want to start managing (and paying for) additional servers, which were required for mostly one-off functionality.

At the time, about 7 or 8 years ago, deployment of lambda functions to AWS was a pain much larger than it is now. So Serverless Framework, which eliminated lots of custom scripts and instead worked with Infrastructure-as-a-Code, was an easy choice. Little did I know that as my backend grew, I would have huge chunks of custom CloudFormation templates anyway, that were patching holes in Serverless’ functionality, or that missing parts of Serverless that were implemented as external plugins, wouldn’t be supported by their authors for years. And yet, switching from it wasn’t on the horizon as it was much easier to hack around the next limitation of it, rather than rewrite everything to something else entirely. It wasn’t on the horizon until recently.

Enshittification of Serverless

Just to be clear, I’m not blaming Serverless for that, it’s the typical strategy of so many products that start free/open and then, when they have enough customers, close and monetize. It was brilliantly described by the Brian Balfour on the examples of larger players like Facebook, YouTube, and, recently, “Open"AI.

GitLab issue I created long time ago

One day, my CI started giving me warnings that I’m using soon-to-be deprecated versions of the Serverless framework that will stop having any security updates. That’s for Javascript, which I consider inherently unsafe anyway. Still, I postponed the pain of migrating the “whole backend”, which by that time had grown significantly, contained tens of DynamoDB tables, S3 buckets, SQS queues, SNS queues, State Machines, custom CloudWatch streams etc. I didn’t know what to migrate to and I didn’t want to know, meaning I didn’t have time to learn something new.

NOTE: I wasn’t even close qualifying for $2M ARR that required a paid Serverless license. I was, however, initially qualifying based on a limit of services deployed for free - their credit system, which they later pulled back. But the notion of letting a random untrustworthy company know about what, when and where to I deploy in AWS is just bonkers, in my opinion. Not to mention an additional security exposure with your AWS Access Keys and in-app secrets.

When I only started my app’s “backend”, I wrote a note in the docs that I chose Serverless also because it was supposed to be vendor-neutral, which should have theoretically helped migrating my backend to another cloud provider like Azure. However, not only has my project grown a huge vendor lock to AWS-only products and their logic over the years, but also, with the new version 4, Serverless has completely dropped support for non-AWS cloud providers.

Enter LLMs

Although these days I’m primarily focusing on my new project, my previous project’s broken CI eventually started getting on my nerves. I didn’t need to redeploy or make changes there so often to be frustrated, but it was always on the back burner. So now, when LLMs became ubiquitous enough, I thought of giving the migration a go.

Even though LLMs never worked for me so far in a way they are sold by the likes of Satya Nadella, I figured that this is the type of task they should be able to do (relatively atomic, very limited scope, “translation” from A to B).

AWS CDK vs AWS SAM

I didn’t want to use any external provider like Pulumi (or, for that matter, Terraform, which pulled off a move remotely similar to Serverless), so there were only two platform-native (AWS) choices: SAM or CDK. The benefit of using platform-native frameworks is obvious: AWS is already getting money from you for resources and they do not have an incentive to charge you more for the way you create those resources, and AWS has a better track record of supporting tooling (hi, GCP!).

AWS SAM (Serverless Application Model) seemed like a “more verbose” version of the Serverless framework, but from AWS (read: a hope it will not be deprecated any time soon). Everything was familiar: a bunch of CloudFormation yaml stitched together by more of the custom yaml. Infrastructure-as-a-Code still and the amount of yaml-to-yaml changes from Serverless is bearable.

AWS CDK (more similar to Pulumi) is a mixed bag of writing your infrastructure imperatively as an app in one of the “supported languages” (warning: cake was a lie). Before deciding, I wanted to attempt to use it, so I scaffolded a Hello World example in Go (which was my language of choice). And, surprise: even if you use a Go version, internally it calls Javascript framework of the CDK, which you need to install and maintain via package.json. Using CDK also required a full rewrite of declarative Serverless’ yaml to imperative Javascript code. Another minor argument was that CDK is much younger than SAM and has less material online for LLM training.

So a choice was made in favor of the SAM and looking ahead, I do not regret it.

Doing test runs

I was hesitant to just add all my serverless templates into the Aider’s context and go something like “and now migrate all of it to AWS SAM” (and, looking back, to review 5453 new lines of SAM yaml). Luckily enough, I have a few projects that use lambda functions and all of them were built on Serverless. So I started migrating them one by one, not only to test LLMs, but also to “learn” SAM. All of these projects are rather small, if you compare with my app’s backend: they contained each only a few functions max, a single DynamoDB table and sometimes something more exotic like an SQS queue. Also, the user impact only restricted to me. In other words, a perfect “sandbox”.

These migrations allowed me to learn common shortcomings of LLMs on this particular task (I was using Claude Sonnet 3.5 and, later, Gemini 2.5 pro preview):

processing large contexts again and again is simply expensive (each day I clocked $6-10 for Anthropic)
LLMs are stupidly following instructions and are not able to “think outside of the box” which is sometimes needed to solve the “actual task”. For example, sometimes cutting a corner on a higher/lower level of abstraction eliminated the entire problem at hand that LLM was struggling with.
correct Yaml paddings are really hard for LLMs for some reason
multiple hallucinations of non-existing features in SAM (e.g. “creative” built-in SAM policy names for certain services)

Getting a bit frustrated with Sonnet, I decided to give Gemini a try, using Gemini 2.5 pro preview model at the time of migration. That model came with it’s own weird things like inserting comments on every line it changes (also inside HTML tag’s attributes, but that’s another story) so I had to add “do NOT add any comments” to every /code chat completion I run in Aider. But the main benefit was that it was much cheaper to use with large contexts so I stuck to it with a few remaining lambdas.

This exercise familiarized me enough with SAM structure (e.g. how serverless.yml becomes samconfig.toml + template.yaml), its limitations, idiosyncrasies and ideas, polished my new deployment Makefile commands and prepared me for migrating the main app’s backend.

Migrating the main application

My overall plan with migration was the following:

make dev environment based on SAM fully working
deploy new SAM prod environment with different naming convention side-by-side with Serverless prod environment
migrate data, external references, DNS etc. to SAM version, verify that it works and eventually nuke Serverless version

The initial work of migrating around 15 serverless.yml files into corresponding template.yaml variants was accomplished in just 2 days in Aider. However, I spent the rest of the week fixing every template (sometimes multiple, multiple times) after Gemini, reading every line myself and changing every other.

My default workflow was, working with Aider, to accomplish the next thing with Gemini, then reviewing commits, attempt fixing those commits with Gemini, and giving up and fixing myself.

Aider workflow Work cycle: fixing stuff myself after a bunch of changes by Gemini/Aider

Has Gemini saved my time? In the end, I don’t know for sure. But it felt mentally easier to fix broken SAM templates than learning it and write files from scratch.

It took in total a week of making dev environment to spin up without errors and I consider it a success. However, migrating prod was quite hard because of the need to backup and migrate user data, and various connections to external services.

The aftermath

Obviously, not depending on Serverless anymore, which was the goal, was achieved and now, with gratitude, I can finally forget about Serverless.

My app’s backend was evolving over the course of almost a decade and as much as I attempted to keep things organized and document what I thought was important, it had a lot of “ad-hoc things” somewhere in the AWS dashboard, or in CloudFlare DNS or otherwise implicit undocumented assumptions. Doing a full migration/deployment forced me to write a careful migration plan and then, during its execution, to document all things that were actually needed to deploy a whole new stack. Moving forward, now I have much better documentation of all those implicit things.
One thing that this migration has forced me to do is to add more end-to-end tests. You see, I knew I was very lazy to manually test all things after migration so I wrote way more end-to-end tests for the existing infra to check if they would pass after prod migration. These tests are still working and will benefit these services in the future.
I made a number of mistakes during migration. These mistakes were uncovered during the 2 months following the migration and they were cosmetic enough not to cause actual problems for end users. Most of them were found thanks to my passive logs alerting.
- external integrations and undocumented ad-hoc changes (e.g. wrong prefixes for S3 keys etc., outdated CloudFront origins, connected to the old S3 buckets via manual policy, mistake in DNS records and things like that)
- AWS-specific problems that I learned about only during the migration (for example, names of API keys for CloudFront have to be unique across all CloudFormation stacks, different way of managing custom domains in Serverless and SAM etc.)
- compute part of my previous backend services (Serverless version) never had a “maintenance mode”, in which cached data would be read-only, but no new writes/modifications were allowed. This would have allowed for a smooth migration and I had to add it before doing anything with prod.
- Also I realized that, in my desktop app itself, I have never envisioned a “maintenance mode” UI handling for end users, during which some APIs were not “failing”, but just “temporarily unavailable”. It seems obvious retroactively, but there was no use-case for it all those years. Releasing and pushing the update for the desktop app is a major pain so this was sacrificed for this migration.
Overall deployments with Serverless were faster thanks to all the custom ways Serverless was doing things (serverless compose, routers, managing S3 buckets for deployments etc.). Now, with AWS SAM and CloudFront templates, even though it is a bit slower, there’s a benefit of applying only atomic diffs of changes and relying of the core technology that powers all other ways to deploy to AWS.