Ask HN: Would you use a serverless, pay-per-second model for AI inference?

3 months ago 1

Hey HN,

As a developer, I’ve been using the big AI APIs quite a bit for my projects. While the technology is incredible, I keep hitting a financial wall. I've had single, complex prompts with large contexts cost me $1-2 (claude opus 4), which is a tough pill to swallow when you're a solo dev paying out-of-pocket.

It feels like the current pricing models can be a real barrier to experimentation and building more ambitious, personal projects that my day job won't cover.

This has led me to explore a "what if" scenario for a more developer-friendly pricing model, and I'd love to get your thoughts on its feasibility.

The Idea: Serverless AI Inference My proposal is a platform that prices AI inference based on the actual compute time used.

Think of it exactly like AWS Lambda, but for LLM prompts. The model is simple:

You don't host or manage anything. You just send your prompt to an API endpoint.

We handle routing it to an available GPU from a large, shared hardware pool.

You get charged only for the execution time your task consumes the GPU, billed by the seconds.

This approach provides a direct, transparent link between the resources you use and what you pay. For tasks that are compute-heavy but don't necessarily have a massive word count, this could be a much more predictable and affordable way to build.

My Questions for the Community: Do you also find the cost of using AI APIs to be a barrier?

As a developer, would you prefer paying for compute time in a serverless model like this? What potential downsides do you see?

I'm a backend engineer, and while building this is a challenge I'm willing to take on, is this economically feasible? Am I underestimating the costs of managing a shared GPU pool and competing with existing players? Or could this be a sustainable business that genuinely solves a problem?

I haven't started building anything yet, but the high costs I'm facing are pushing me to seriously consider it. I wanted to tap into the collective wisdom of HN to see if this is a problem worth solving or if I'm just shouting into the void.

Read Entire Article