We're building cross-functional AI-native teams (Would love your feedback)

3 months ago 1

The emergence of large language models (LLMs) is transforming how we build intelligent systems. Traditional machine learning (ML) workflows typically follow a rigid path: collect labeled data, build training & inference infrastructure, train and validate models, and finally integrate them into product features. But in today’s AI-native environment, especially with general-purpose models like LLMs, that flow is being flipped on its head. We can start with the product, prototype fast, and use real interactions to inform data and model needs later.

I’ve worked on both sides of this shift. In my previous roles I helped launch traditional ML workflows that required extensive dataset curation, infrastructure, and delayed product validation (sometimes years instead of quarters).

At Thumbtack, we’re taking a different approach within our R&D org. We’re building AI-native systems that are product-first by default. We start by designing an experience, then use context engineering techniques to rapidly test our ideas in the real world. The user feedback loop is immediate and that’s changing how we work across the board.

Before we dig deeper, here is an infographic that explains how Machine Learning, Deep Learning, Generative AI & Large Language models are all subsets of an old field called Artificial Intelligence. It is helpful to understand this nuance.

Here’s the typical flow we’ve seen with traditional ML:

  1. Identify a high-value use case that could be solved with ML (classification, recommendation systems, forecasting etc)
  2. Spend weeks or months collecting labeled data.
  3. Build infrastructure for data pipelines, feature engineering, training, and evaluation.
  4. Tune and validate models.
  5. Finally integrate into the product and hope it delivers impact.

The bottleneck here is feedback. Product validation often comes too late, after months of technical investment. Meanwhile, teams are siloed: ML engineers & applied scientists focus on models, product & infra teams worry about operational constraints like latency & scalability, and product/design wait until the end to assess whether it’s even useful.

At Thumbtack, we’re embracing a new way of building intelligent systems, one that starts with product needs, not data pipelines. The core unit of work isn’t just a model. It’s an AI-powered experience.

We start by defining the user problem. We design the ideal experience and build functional prototypes using AI and function invocations. We test these prototypes with real users and evolve them into production-quality flows.

This shift in approach enabled us to ship various, high quality, AI driven systems within a single quarter. That speed is only possible because of tight feedback loops and product-first development. We’re not just prototyping, we’re operationalizing intelligence through integrated loops of design, engineering, and user feedback.

This AI-native approach is also part of our broader push to embrace AI with a new GenAI strategy at Thumbtack. We’re replatforming how search, diagnosis, and decision-making work when a user has a home service problem.

It’s worth noting that this shift isn’t just happening at Thumbtack, it reflects a broader industry-wide transformation in how intelligent systems are built. We’re glad to be innovating at the forefront of this movement, actively shaping what AI-native product development looks like.

To put this shift in perspective, Andrej Karpathy described three generations of software in his recent YC Startup School talk:

  • Software 1.0: Code written by humans that instruct computers to perform a task.
  • Software 2.0: Weights that program neural nets.
  • Software 3.0: Prompts written in natural language to steer LLMs a.k.a programming the neural nets with English

In his talk he also called out that we are entering a generational shift where a lot of software will be re-written. We are starting to see this trend at Thumbtack, and as we make progress, we realize that all 3 generations of Software will need to co-exist to create a magical experience for the user.

With Software 2.0, we evolved Thumbtack’s search, relevance and recommendation systems using traditional ML models driven by data, trained weights, and optimized pipelines. But adapting those systems to newer interaction paradigms requires significant time, expertise, and resources & drastically increases time to impact.

With the breakthrough in LLMs, we began adopting Software 3.0. Instead of retraining models from scratch, we now use natural language prompts to prototype, iterate, and deploy intelligent experiences faster. Software 3.0 makes English the new programming language enabling product managers, designers, and engineers to collaboratively “program” LLM behavior and evolve intelligent systems together.

We also don’t throw away the past. In one of our research deep dives, our applied scientists evaluated embeddings (Software 2.0) vs. prompting (Software 3.0) .Prompting offered flexibility, but embeddings delivered stronger performance. The takeaway? Building great products often means blending all three generations of software.

AI-native development is changing how our teams operate:

  • Cross-functional from the beginning: Engineers, PMs, product designers, content designers, and researchers co-design systems together. Content design plays a huge role here. They partner with engineers on prompt writing and prompt versioning. They shape how the system speaks and behaves. This is programming, just in English.
  • Parallel Engineering execution: While backend & applied science lays foundations, native engineering rapidly prototypes with design using AI tooling to accelerate the process and empower non-technical contributors. This way all functions are contributing concurrently, instead of being limited by a waterfall-style of working one function at a time.
  • Continuous prototyping to real-world tests: We don’t throw away early ideas and use them to rapidly narrow down the solution space. Prototypes evolve into real production tests.

To make this work, we’ve adopted new mechanisms:

  • Build Sprints: One-week, intensive engineering & product dev cycle focused on fast development and iteration. These help us make quick progress on ambiguous & hard 0-to-1 problems. Build sprints have also been instrumental in making fast progress on things like aligning on API contracts and high level system design for Backend flows. In a way, we are conducting low level scoping by doing instead of just writing RFCs. These sprints also help us evaluate different ML models / technical approaches and figure out where true limitations exist so we can invest our resources in the right problems sooner.
  • Design Sprints: Accelerate the design-thinking side of problem exploration with a one-week rapid design exercise. Design sprints are a very focused XFN effort where everyone comes together to brainstorm and expand the solution space for a user problem with design. We seek to diverge, and then quickly converge the idea space to balance thorough exploration with speed.
  • Integrated fidelity based UXR workflows: UX research is integrated into every step of the product design process. We test everything from early Figma mocks to medium-fidelity functional prototypes. For particularly novel or eng investment-heavy ideas, we often deploy functional prototypes via unmoderated UXR studies using tools like Dscout. These studies provide scalable, directionally accurate signals before we commit serious engineering effort.
  • Tools & Prompting techniques: We have adopted tools like OpenAI Playground to test, refine and collaborate on prompts. We have also adopted techniques like Metaprompting to make our prompts more powerful & improve the quality of outputs from LLMs.

These mechanisms started in one of our R&D orgs, but other teams across Thumbtack are picking them up because they work.

We’re early in this shift, but here are some of the most important lessons we’ve learned:

Benefits:

  • Faster Iteration: We’ve gone from idea to prototype in days, and to production in weeks not quarters.
  • Lower Upfront Cost: By testing with LLMs and prototyping before investing in model training or infrastructure, we conserve technical resources.
  • Shared Ownership: Designers, engineers, and product managers now shape AI behavior together.

Tradeoffs:

  • Reliability Challenges: LLMs are flexible but non-deterministic. We mitigate this with fallback systems, rule-based overrides, and structured input & output generation. AI evals are critical and we are currently investing in an AI evals strategy that will enable us to move faster while upholding our quality and promise to our customers and pros.
  • Prompt Lifecycle Management: Prompts need versioning, testing, and quality control just like code. We’re building internal tools & processes to manage this.
  • Trust & Safety Risks: LLM behavior must be reviewed by QA, legal, and privacy teams. That adds a layer of process, but it’s essential.

These tradeoffs are manageable and worth it. We’re seeing real user impact and faster product cycles as a result. And again: we don’t discard the past. All three software generations-1.0, 2.0, 3.0 have a place. We use the right one for the right job, but we begin our journey with a product first mindset and couple it with a Software 3.0 style execution.

We’re not just building ML systems. We’re designing AI-native experiences.

That requires new loops: tighter, faster, and more cross-functional. It changes who gets to “program” intelligence. It forces us to reframe speed vs. safety, flexibility vs. governance, and quality vs. iteration.

And we believe this can be how intelligent software is built going forward.

If your team is on a similar journey whether you’re starting from Software 1.0 or blending all three generations we’d love to hear from you. And if you’re interested in joining us on this journey come join us!

This shift to AI-native engineering has been a huge change not just in how we build, but in how we work together as a team. It truly took a village. I want to thank every engineer in our org for leaning into this transformation with openness, grit, and the kind of team spirit that reflects Thumbtack’s values of Choose Teamwork and Make It Count.

I’m also incredibly grateful to our cross-functional partners Adele Maynes Romero, Alex Huang, Anais Ziae-Mohseni, Angeline Vu, Erin Hogan, Grace Boatwright, Hannah Siegel, Katie Zhu, Marios Kokkodis, Valerie Peppers, who helped us navigate ambiguity and move fast with care. From shaping prompts to understanding model behavior and refining product flows the team brought clarity, creativity, and momentum when we needed it most.

Big thanks to our fellow engineering leads Dan Capo, Joe Soltzberg, Mallika Porter who gave their teams the space to lean into this shift and embrace new ways of working. Thanks for backing your teams to move fast, try new things, and make meaningful progress.

And to our pillar leads Jody Allard, Melissa Hribar, Nicole Bacchus, Peter Yeung, Wade Fuller, Waqas Sheikh, thank you for backing us from day one and pushing us to aim high: Their support gave this effort the strategic air cover it needed to take root and grow.

There are too many people to recognize here so this is only a handful of them but many more have contributed to make this happen. This work came together because so many people showed up, took risks, and built together!

References:

  1. How We Created a Generative AI Strategy at Thumbtack
  2. Karpathy on Software 1.0 → 2.0 → 3.0
  3. Embeddings vs Prompting
  4. Evolution of Search at Thumbtack
  5. Review Relevance @ Thumbtack
  6. Metaprompting
  7. Context Engineering
Read Entire Article