Show HN: AI PM Evaluation Framework (Open Source)

1 day ago 1

A rigorous, open-source framework for evaluating Applied AI Product Manager candidates in the era of rapid AI evolution. Built for teams who need builders, not coordinators.

Why I Built This

This conversation between Aakash Gupta and Jaclyn Konzelmann (Google's Director of AI Product, on his podcast) truly inspired me. Jaclyn's AI PM evaluation framework was so clear and rigorous that I immediately thought: "I need to measure myself against this."

"In disaster relief, you don't map safe routes and hide them. You share them."

I started outlining her criteria as a self-assessment. But then I realized: this shouldn't be a private checklist. So this became a framework with a specific purpose: give everyone, especially people from non-traditional backgrounds, an actionable guide on what to build, where to focus, and how to position themselves for AI PM roles.

If you have edits or ideas on how to improve please submit them on GitHub.

Continue to Philosophy →

The conversation that inspired this framework

🏗️ Builders Over Coordinators

The era of pure project management is over. 2025 AI PMs must ship code, prototype in hours, and demonstrate technical depth through personal projects.

Key shift: From "managed a team that built X" to "I built X in a weekend."

⚡ Velocity as Core Competency

Speed isn't optional—it's existential. AI capabilities evolve weekly. PMs must prototype, test, and iterate faster than the technology changes.

Evidence required: Portfolio of 8-10 concurrent side projects, built in days not months.

🔬 Deep AI Intuition

Surface-level awareness isn't enough. True AI intuition comes from hands-on building—understanding model limitations, prompt engineering, and architectural tradeoffs.

Non-negotiable: Personal AI projects demonstrating creative application of LLMs, agents, or workflows.

🌐 Building in Public

The best AI PMs share their learning journey publicly—through blogs, GitHub repos, demos, and thought leadership that helps others build.

Signal: Active GitHub, technical blog, or regular AI experimentation shared openly.

The 2025 Paradigm Shift: We're not hiring people to manage AI product development. We're hiring people who can build AI products themselves, then scale that capability through teams.

01

Technical Skills & Hands-On Building

Can they actually build things, or just talk about building?

  • Evidence of recent hands-on coding (GitHub activity, personal projects)
  • Personal AI tools/agents/workflows built and shipped
  • Ability to prototype ideas in hours, not weeks
  • Comfort with modern development tools and workflows
  • Technical curiosity demonstrated through experimentation

Red Flag: No GitHub repos, no personal projects, last code written 5+ years ago

02

Product Thinking & 0-to-1 Leadership

Have they taken something from idea to shipped product?

  • Clear examples of 0-to-1 product launches
  • User-centric problem definition and validation
  • Comfort navigating ambiguity and incomplete information
  • Evidence of product taste and design sensibility
  • Metrics-driven decision making

Strong Signal: Launched multiple products from scratch with measurable user impact

03

AI/ML Knowledge & Deep Intuition

Do they understand AI from hands-on experience, not just articles?

  • Personal AI projects demonstrating model understanding
  • Knowledge of current capabilities and limitations
  • Experience with prompt engineering, fine-tuning, or agent workflows
  • Creative applications of AI to solve real problems
  • Stays current through active experimentation, not passive reading

Critical Test: Can they explain why they chose GPT-4 vs Claude vs Gemini for a specific use case?

04

Communication & Building in Public

Do they share their learning journey and help others build?

  • Active technical blog, Substack, or public documentation
  • GitHub repos with clear README and demos
  • Thought leadership that advances the field
  • Compelling storytelling and narrative structure
  • Ability to explain complex technical concepts simply

Differentiator: 5,000+ engaged followers sharing AI building insights regularly

05

Strategic Thinking & Second-Order Vision

Do they build platforms that enable others, or just first-order features?

  • Platform thinking: tools that get better as AI improves
  • Understanding of ecosystem dynamics and network effects
  • Long-term vision balanced with rapid iteration
  • Ability to identify leverage points and force multipliers
  • Systems thinking applied to product architecture

Question: Are they building for today's AI or tomorrow's?

06

Execution & Rapid Shipping

Do they treat ideas as cheap and execution as everything?

  • Portfolio of projects shipped in days, not months
  • 8-10 concurrent side projects demonstrating breadth
  • Bias toward action over analysis paralysis
  • Comfortable with imperfect v1s and rapid iteration
  • Language of building: "I shipped" not "I managed"

Litmus Test: Can they build and ship a demo in a weekend?

Evaluation Flow

Step 1: Minimum Thresholds (Pass/Fail)

Personal AI Projects: Must have at least 1 visible AI project with code/demo

Building in Public: Evidence of sharing work (GitHub, blog, demos)

Resume Creativity: Shows product taste beyond standard LinkedIn template

Fail any threshold → No Screen

Step 2: Red Flags (Disqualifiers)

Job hopping without clear narrative, inflated titles, vague responsibilities, no concrete metrics, plagiarism or misrepresentation

🚩 Any red flag → No Screen

Step 3: Must-Have Signals (5/5 Required)

  1. Evidence of continuous learning and staying current with AI
  2. At least 1 personal AI project with evidence
  3. Experience shipping products (not just planning)
  4. Compelling narrative explaining their journey
  5. Clear alignment with AI/ML product space

⚠️ Missing any → Maybe (at best)

Step 4: Differentiation Signals (3+ for Strong Screen)

  • 8-10 concurrent side projects
  • Built something in hours/days, not months
  • Active technical blog or significant following
  • Open-source contributions or community leadership
  • Platform/framework thinking in past work
  • Speaking at conferences or thought leadership
  • Unique background or unconventional path
  • Evidence of rapid prototyping velocity

3+ signals → Strong Screen

Scoring System

Pillar Weight Max Score Evaluation Focus
Technical Skills 10 points 10/10 GitHub activity, personal projects, code quality
Product Thinking 10 points 10/10 0-to-1 launches, user impact, metrics
AI/ML Knowledge 10 points 10/10 Personal AI projects, hands-on evidence
Communication 10 points 10/10 Public building, blog, thought leadership
Strategic Thinking 10 points 10/10 Platform thinking, second-order effects
Execution 10 points 10/10 Shipping velocity, portfolio breadth
Total 60 points 60/60 Aggregate across all pillars

🔄 For Resume Screening

Use the automated analyzer to get AI-powered analysis from multiple providers (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro):

bin/analyze --deep-analysis resume.pdf

Generates comprehensive HTML reports with consensus scoring and detailed pillar breakdowns.

👥 For Interview Panels

Share the framework with all interviewers beforehand. Use pillar-specific questions to probe each area:

  • "Walk me through your GitHub repos"
  • "What did you build last weekend?"
  • "Show me your AI experiments"

📊 For Calibration

Run multiple candidates through the framework and compare scores. Calibrate your team's understanding of what "Strong Screen" looks like in practice.

🔧 For Customization

Fork the GitHub repo and adapt the framework to your needs. Adjust weights, add custom criteria, or modify the scoring rubric.

We welcome feedback, improvements, and ideas! Submit pull requests or open issues on GitHub to help make this framework better for everyone.

View on GitHub →

Ready to Raise Your Hiring Bar?

Start evaluating AI PM candidates with the 2025 standards. Open source, free to use, and continuously evolving.

Read Entire Article