Show HN: Zeno – A framework for verifiable RL rewards (code, math, and more)

1 week ago 4

Verifiable RL rewards for LLMs that actually make sense.

  • Provides verifiable, interpretable reward functions for finetuning LLMs, especially for code, and soon, other domains.
  • Keeps all rewards transparent and hackable.
  • No secret sauce. No LLM-as-a-judge. No magical "alignment". Just actual auditable rules.

If you care about trusting your rewards, and being able to debug them, start with Zeno.

git clone https://github.com/think-a-tron/zeno.git cd zeno uv add ./zeno

Zeno ships with a set of plug-and-play reward functions for Python code completions for now. All rewards are stateless, verifiable, and don't require extra config or setup.

Reward What it rewards or penalizes Score Range
reward_docstrings Proportion of functions/classes with docstrings 0.0 to 1.0
reward_lint Fewer lint errors (via ruff), normalized per line 0.0 to 1.0
reward_direct_recursion Presence/absence of direct recursion (configurable) 1.0 / 0.0 / -1.0
reward_list_comprehension Use (or avoidance) of list comprehensions 1.0 / 0.0 / -1.0
reward_type_hints Fraction of args/returns with type hints 0.0 to 1.0
reward_exception_handling Has at least one try/except (reward) or none (penalize) 1.0 / -1.0
reward_functional Prefers functional (no class) over OOP style 1.0 / -1.0

You can plug them directly into trl as your reward function.

from trl import GRPOTrainer from zeno.code.python import reward_lint trainer = GRPOTrainer( model="Qwen/Qwen3-0.6B", reward_funcs=[reward_lint], )

Want to add more rewards? PRs welcome, but keep it verifiable and explainable. No black-box models.

  • Stepwise rewards for math reasoning.
  • Multi turn rewards for tool use.

MIT

Read Entire Article