Build for GPT-5, ship with GPT-4: why shipping worse products today make sense

2 days ago 3

"The best way to predict the future is to create it." — Peter Drucker

Have you ever noticed how some tech problems just... vanish? Remember when mobile storage was this huge constraint everyone optimized around? Or when image compression was this whole specialized field? Like remember progressively loading JPEGs?

The AI world moves even faster. While one team burns months building complex chunking algorithms to handle context window limitations, another team simply waits for the next model release to make the whole problem (and often the whole start up) irrelevant.

If you only have 5 minutes: here are the key points

  • AI product development is shifting: Winning teams prioritize building for future capabilities, not just optimizing for current model constraints.

  • Constraint Decay Principle: Many technical limitations—like token limits or high costs—fade rapidly with new model releases.

  • Examples that paid off: Companies like Granola and Sourcegraph thrived by betting on near-term model improvements instead of solving temporary issues with permanent code.

  • Strategic framework: Sort constraints by how fast they’ll decay (fast, medium, or slow), then decide whether to work around or wait them out.

  • AI-technical debt is real: Investing heavily in solutions to soon-to-vanish problems is a liability, not an asset.

  • Balance is key: Future-ready teams still deliver value today, but avoid overengineering for constraints that won’t last.

Here's what separates successful AI products from the ones that die on the vine: building for tomorrow's capabilities while shipping with today's technology.

I like Granola, the super meeting taking app. The Granola team faced this exact dilemma when building their AI meeting assistant. Their first version couldn't handle meetings longer than 30 minutes because of model context limitations. Most teams would've done the "responsible" thing - spend months building complex solutions to chunk and process longer meetings.

But they didn't.

Instead, as CEO Chris Pedregal noted, they built a product "at the mercy of the ongoing improvement of commercial AI models" - and they were banking on those tools getting better and cheaper over time. Rather than engineering around a temporary limitation, they focused on making their product amazing for shorter meetings while building their architecture to easily expand when larger context windows became available.

They were right. Within months, new models emerged with 5x the context window. While competitors were still untangling their complex workarounds, Granola simply plugged in the new models and scaled up. When Granola 2.0 launched, it supported the latest "reasoning models" capable of detailed analysis across large numbers of meetings - something they never had to build custom solutions for.

Sourcegraph's AI coding assistant Cody tells a similar story. As a tool designed to help developers understand their entire codebase, Cody initially faced a significant constraint: early models could only accept 4K-8K tokens of context, far less than needed to understand a complex codebase.

Instead of building a complex, permanent solution to this temporary problem, Sourcegraph implemented simple retrieval techniques while designing their architecture to easily plug in bigger-context models as they arrived. They anticipated that context windows would expand rapidly, so they didn't hard-code to small limits.

When Anthropic's Claude arrived with a 100K token window (later expanding to 200K), Sourcegraph immediately leveraged it. Cody can now accept "large amounts of code context" - essentially entire files or multiple files at once - with "near-perfect recall" of the codebase.

The payoff? While competitors were stuck with solutions built around yesteryear's limitations, Sourcegraph delivered the "AI that knows your entire codebase" - a capability that simply wasn't possible with older models but became reality because they built for tomorrow's AI.

Here's what I call the "Constraint Decay Principle": In AI product development, today's technical bottle necks typically disappear within 6 months (when a new thing becomes the new bottle neck, but rarely the same dimension):

The math here is pretty clear:

  • Context windows: GPT-4 jumped from 8K to 32K tokens in months, Claude expanded to 100K, Gemini features up to 1 million tokens

  • Reasoning capabilities: Each model generation brings dramatic improvements in logical reasoning, planning, and consistency

  • API costs: Dropped from $0.03 to $0.0004 per 1K tokens for some models

  • Quality improvements: Exponential, not linear

Yet most teams build like today's constraints are permanent. It's like optimizing for dial-up internet in 1998 – technically impressive, strategically misguided.

Here's the most counterintuitive part that most product managers miss: Companies like Granola, Sourcegraph, and the others made a deliberate, painful trade-off that goes against every product management instinct. They chose to ship a worse customer experience now, knowing they could provide a 10x better one in 6 months.

This is mind-bendingly difficult for most founders and product leaders to accept. We're trained to optimize for the best possible user experience today, to solve every problem we can identify. Deliberately leaving problems unsolved feels like product malpractice. But in the rapidly evolving world of AI, it's actually the smartest move.

The companies that win aren't trying to perfect today's experience at the expense of tomorrow's capabilities. They're making strategic bets on which constraints will disappear quickly, focusing their engineering resources on enduring problems, and designing their products to automatically benefit from the next wave of AI improvements.

Here's a controversial take: in AI products, technical debt isn't what you think it is. It's not bad code or lack of testing. The real technical debt is every line of code you write to work around today's limitations.

  • That complex RAG system you built to handle context windows? Debt.

  • That token-counting dashboard everyone's so proud of? Debt.

  • That sophisticated caching layer to manage API costs? Debt.

Every engineering hour spent solving temporary constraints is an hour not spent on solving enduring problems. And unlike traditional technical debt that lingers for years, AI constraint debt comes due in months when new models make your clever workarounds not just unnecessary but actively harmful.

There’s two tribes in AI product development:

  • Spend 70% of engineering resources on optimization (performance, UX, features,…)

  • Build complex systems to work around model limitations

  • Pride themselves on squeezing every bit of performance from current models

  • End up rebuilding core systems every 6-9 months

⇒ They build a better product today, but pick up AI-technical debt, unable to increase the products’ value by much over the next 6 months.

  • Accept current limitations as temporary

  • Build architectures designed to scale with future improvements

  • Focus engineering on enduring user problems

  • Upgrade seamlessly when new models arrive

⇒ Build a worse product today, but 10x that in 6 months.

The pattern is clear across the industry. Every major model release creates a graveyard of optimization-focused startups while future-focused teams simply swap in new models and move on.

Building for tomorrow doesn't mean ignoring today's constraints entirely. It means being strategic about which problems to solve and which ones to wait out. Here's a framework for making this decision:

Sort your technical constraints into three buckets:

Fast-Decay Constraints (Half-life: 3-6 months)

  • Context windows

  • Token limits

  • Raw processing speed

  • Basic capabilities (like tool use)

  • API pricing

Medium-Decay Constraints (Half-life: 1-2 years)

  • Domain-specific knowledge

  • Complex reasoning

  • Multimodal integration

  • Factual accuracy

Slow-Decay Constraints (Half-life: 3+ years)

  • User trust issues

  • Security requirements

  • Regulatory compliance

  • Core business logic

For each Fast-Decay constraint, ask:

  1. What's the simplest possible workaround that delivers value now?

  2. Can we design this workaround to be easily removed later?

  3. Would waiting 3-6 months for better models be catastrophic?

If you can find a simple workaround that's easily removable, and waiting would be truly catastrophic, then implement it. Otherwise, design around the constraint with a future-ready architecture.

This future-ready approach creates fascinating competitive dynamics. Teams that waste resources optimizing for temporary constraints end up with complex, brittle systems just as those constraints disappear. Meanwhile, teams that build with future capabilities in mind can immediately leverage new models to deliver capabilities their competitors can't match.

Granola's bet on larger context windows, Sourcegraph's plug-and-play model approach, and Sudowrite's anticipation of million-token models - all these examples show the same pattern. They built for tomorrow's AI capabilities while delivering value with today's technology.

Not every limitation is temporary. Sometimes, you need to build for today despite future improvements. Watch for these signs:

  • Users need the solution now, even if imperfect

  • The problem won't be solved by better models alone

  • The cost of waiting exceeds the cost of rebuilding (which is super hard to estimate, because exponentials are involved! But hey, you may try to estimate here)

Think of it like the early days of mobile apps. Yes, phones would get more powerful, but people needed apps that worked on the phones they had. The key was building in ways that could scale up naturally, not waiting for perfect conditions.

Read Entire Article