Show HN: AI PM Evaluation Framework (Open Source)

1 day ago 1

A rigorous, open-source framework for evaluating Applied AI Product Manager candidates in the era of rapid AI evolution. Built for teams who need builders, not coordinators.

↓

Why I Built This

This conversation between Aakash Gupta and Jaclyn Konzelmann (Google's Director of AI Product, on his podcast) truly inspired me. Jaclyn's AI PM evaluation framework was so clear and rigorous that I immediately thought: "I need to measure myself against this."

"In disaster relief, you don't map safe routes and hide them. You share them."

I started outlining her criteria as a self-assessment. But then I realized: this shouldn't be a private checklist. So this became a framework with a specific purpose: give everyone, especially people from non-traditional backgrounds, an actionable guide on what to build, where to focus, and how to position themselves for AI PM roles.

If you have edits or ideas on how to improve please submit them on GitHub.

Continue to Philosophy →

The conversation that inspired this framework

🏗️ Builders Over Coordinators

The era of pure project management is over. 2025 AI PMs must ship code, prototype in hours, and demonstrate technical depth through personal projects.

Key shift: From "managed a team that built X" to "I built X in a weekend."

⚡ Velocity as Core Competency

Speed isn't optional—it's existential. AI capabilities evolve weekly. PMs must prototype, test, and iterate faster than the technology changes.

Evidence required: Portfolio of 8-10 concurrent side projects, built in days not months.

🔬 Deep AI Intuition

Surface-level awareness isn't enough. True AI intuition comes from hands-on building—understanding model limitations, prompt engineering, and architectural tradeoffs.

Non-negotiable: Personal AI projects demonstrating creative application of LLMs, agents, or workflows.

🌐 Building in Public

The best AI PMs share their learning journey publicly—through blogs, GitHub repos, demos, and thought leadership that helps others build.

Signal: Active GitHub, technical blog, or regular AI experimentation shared openly.

The 2025 Paradigm Shift: We're not hiring people to manage AI product development. We're hiring people who can build AI products themselves, then scale that capability through teams.

Technical Skills & Hands-On Building

Can they actually build things, or just talk about building?

Evidence of recent hands-on coding (GitHub activity, personal projects)
Personal AI tools/agents/workflows built and shipped
Ability to prototype ideas in hours, not weeks
Comfort with modern development tools and workflows
Technical curiosity demonstrated through experimentation

Red Flag: No GitHub repos, no personal projects, last code written 5+ years ago

Product Thinking & 0-to-1 Leadership

Have they taken something from idea to shipped product?

Clear examples of 0-to-1 product launches
User-centric problem definition and validation
Comfort navigating ambiguity and incomplete information
Evidence of product taste and design sensibility
Metrics-driven decision making

Strong Signal: Launched multiple products from scratch with measurable user impact

AI/ML Knowledge & Deep Intuition

Do they understand AI from hands-on experience, not just articles?

Personal AI projects demonstrating model understanding
Knowledge of current capabilities and limitations
Experience with prompt engineering, fine-tuning, or agent workflows
Creative applications of AI to solve real problems
Stays current through active experimentation, not passive reading

Critical Test: Can they explain why they chose GPT-4 vs Claude vs Gemini for a specific use case?

Communication & Building in Public

Do they share their learning journey and help others build?

Active technical blog, Substack, or public documentation
GitHub repos with clear README and demos
Thought leadership that advances the field
Compelling storytelling and narrative structure
Ability to explain complex technical concepts simply

Differentiator: 5,000+ engaged followers sharing AI building insights regularly

Strategic Thinking & Second-Order Vision

Do they build platforms that enable others, or just first-order features?

Platform thinking: tools that get better as AI improves
Understanding of ecosystem dynamics and network effects
Long-term vision balanced with rapid iteration
Ability to identify leverage points and force multipliers
Systems thinking applied to product architecture

Question: Are they building for today's AI or tomorrow's?

Execution & Rapid Shipping

Do they treat ideas as cheap and execution as everything?

Portfolio of projects shipped in days, not months
8-10 concurrent side projects demonstrating breadth
Bias toward action over analysis paralysis
Comfortable with imperfect v1s and rapid iteration
Language of building: "I shipped" not "I managed"

Litmus Test: Can they build and ship a demo in a weekend?

Evaluation Flow

Step 1: Minimum Thresholds (Pass/Fail)

Personal AI Projects: Must have at least 1 visible AI project with code/demo

Building in Public: Evidence of sharing work (GitHub, blog, demos)

Resume Creativity: Shows product taste beyond standard LinkedIn template

❌ Fail any threshold → No Screen

Step 2: Red Flags (Disqualifiers)

Job hopping without clear narrative, inflated titles, vague responsibilities, no concrete metrics, plagiarism or misrepresentation

🚩 Any red flag → No Screen

Step 3: Must-Have Signals (5/5 Required)

Evidence of continuous learning and staying current with AI
At least 1 personal AI project with evidence
Experience shipping products (not just planning)
Compelling narrative explaining their journey
Clear alignment with AI/ML product space

⚠️ Missing any → Maybe (at best)

Step 4: Differentiation Signals (3+ for Strong Screen)

8-10 concurrent side projects
Built something in hours/days, not months
Active technical blog or significant following
Open-source contributions or community leadership
Platform/framework thinking in past work
Speaking at conferences or thought leadership
Unique background or unconventional path
Evidence of rapid prototyping velocity

✅ 3+ signals → Strong Screen

Scoring System

Pillar Weight Max Score Evaluation Focus

Technical Skills	10 points	10/10	GitHub activity, personal projects, code quality
Product Thinking	10 points	10/10	0-to-1 launches, user impact, metrics
AI/ML Knowledge	10 points	10/10	Personal AI projects, hands-on evidence
Communication	10 points	10/10	Public building, blog, thought leadership
Strategic Thinking	10 points	10/10	Platform thinking, second-order effects
Execution	10 points	10/10	Shipping velocity, portfolio breadth
Total	60 points	60/60	Aggregate across all pillars

🔄 For Resume Screening

Use the automated analyzer to get AI-powered analysis from multiple providers (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro):

bin/analyze --deep-analysis resume.pdf

Generates comprehensive HTML reports with consensus scoring and detailed pillar breakdowns.

👥 For Interview Panels

Share the framework with all interviewers beforehand. Use pillar-specific questions to probe each area:

"Walk me through your GitHub repos"
"What did you build last weekend?"
"Show me your AI experiments"

📊 For Calibration

Run multiple candidates through the framework and compare scores. Calibrate your team's understanding of what "Strong Screen" looks like in practice.

🔧 For Customization

Fork the GitHub repo and adapt the framework to your needs. Adjust weights, add custom criteria, or modify the scoring rubric.

We welcome feedback, improvements, and ideas! Submit pull requests or open issues on GitHub to help make this framework better for everyone.

View on GitHub →

Ready to Raise Your Hiring Bar?

Start evaluating AI PM candidates with the 2025 standards. Open source, free to use, and continuously evolving.

Read Entire Article