RFC: Reinforcement for Creativity

3 weeks ago 2

A universal framework for training creative agents in symbolic domains

 MIT Python 3.8+

RfC (Reinforcement for Creativity) is a learning framework designed to train agents that creatively explore and generate novel solutions rather than simply predict outputs. Unlike traditional neural networks that learn input-output mappings through supervised training, RfC encourages constructive creativity through structured exploration and composite rewards.

Traditional Neural Networks RfC Framework
Learns to predict outputs Learns to explore creatively
Supervised training Training with creative incentives
Minimizes prediction error Maximizes validity and novelty
Memorizes patterns Discovers new patterns
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │ Environment │─────▶│ Generator │────▶│ Flexible Coach │ │ (Context) │ │ π_θ(a|s) │ │ (Evaluator) │ └─────────────┘ └──────────────┘ └─────────────────┘ ▲ │ │ ▼ │ ┌───────────┐ │ │ Feedback │ │ │ • Validity│ └─────────────────│ • Novelty │ │ • Reward │ └───────────┘
  • Environment: Defines the domain and provides context for exploration
  • Generator: Parameterized agent that generates candidate solutions
  • Flexible Coach: Evaluator that assesses validity and novelty
  • RfC Trainer: Orchestrates the creative training loop

For installation, you need to open the .ipynb file in Colab.

Traditional approach:
"Provide conjecture data and train a model to predict new ones."

RfC approach:
"Explore the space of mathematical conjectures creatively; the Coach evaluates validity and novelty to guide the agent toward maximizing both."

1. Mathematical Conjecture Discovery

  • Generates novel number theory conjectures
  • Verifies validity using symbolic proofs
  • Identifies non-trivial mathematical patterns

2. Creative Program Synthesis

  • Synthesizes efficient algorithms creatively
  • Explores iterative, recursive, and formula-based approaches
  • Evaluates correctness and computational efficiency

3. Logical Rule Discovery

  • Discovers new inference rules in formal logic
  • Generates valid inference patterns
  • Verifies logical soundness and identifies novel proof strategies

RfC is fully modular and extensible. Here's how to define your own creative domain:

Step 1: Define the Environment

class MyEnvironment(Environment): def sample_context(self): return {'type': 'my_problem', 'params': {...}} def render_context(self, context): return f"Problem: {context}" def get_action_space(self): return {'type': 'my_actions'}

Step 2: Define the Flexible Coach

class MyCoach(FlexibleCoach): def evaluate_validity(self, candidate, context): validity_score = ... # Your validation logic metadata = {...} return validity_score, metadata def compute_novelty(self, candidate): novelty_score = ... return novelty_score def generate_feedback(self, candidate, context, validity_score, novelty_score): suggestions = [...] return Feedback( validity_score=validity_score, novelty_score=novelty_score, metadata={}, suggestions=suggestions )

Step 3: Define the Generator

class MyGenerator(Generator): def generate(self, context, temperature=None): candidate = ... # Your generative logic return candidate def log_prob(self, candidate, context): return ... def update_parameters(self, gradient): self.parameters += self.config.learning_rate * gradient
config = RfCConfig() env = MyEnvironment() coach = MyCoach() generator = MyGenerator(config) rfc = RfC(generator, coach, env, config) rfc.train(num_episodes=1000) results = rfc.create(n_samples=10)
config = RfCConfig( lambda_v=1.0, # Validity weight lambda_n=0.5, # Novelty weight lambda_c=0.01, # Complexity penalty weight tau_v=0.7, # Validity threshold tau_n=0.6, # Novelty threshold alpha=0.8, # Novelty exponent learning_rate=0.001, # Learning rate max_corpus_size=1000, # Maximum corpus size temperature=1.0 # Sampling temperature )
R(a|s) = λ_v · sigmoid(v - τ_v) + λ_n · η^α - λ_c · C(a)

Where:

  • v = validity score
  • η = novelty score
  • C(a) = complexity penalty
stats = rfc.train( num_episodes=1000, verbose=True, log=True, plot=True ) print(f"Average validity: {stats['avg_validity']}") print(f"Average novelty: {stats['avg_novelty']}") print(f"Valid artifacts: {stats['valid_artifacts']}") print(f"Novel artifacts: {stats['novel_artifacts']}")

RfC excels in domains where:

  • ✅ Validity rules can be clearly defined
  • ✅ Creative exploration is desired
  • ✅ Novelty is as important as correctness
  • ✅ The search space is combinatorial or symbolic
  • Mathematics: Theorem and conjecture discovery
  • Programming: Algorithm synthesis, code optimization
  • Formal Logic: New inference rules, proof tactics
  • Game Design: Novel mechanics, rule balancing
  • Molecular Design: Valid structures with desired properties
  • Music Composition: Novel harmonic progressions
  • Architecture: Creative structural designs

Is RfC a standard RL algorithm?

No. RfC is specifically designed for constructive creativity with key differences:

  • Separates domain knowledge from the agent
  • Uses a deterministic Flexible Coach with structured feedback
  • Explicitly incentivizes novelty, not only reward
  • Action space is typically symbolic/combinatorial

Not necessarily:

  • Small domains: CPU sufficient
  • Simple generators: MLPs, rules, templates work well on CPU
  • Large domains: GPU beneficial for Transformers/LLMs

Can I use LLMs as the Generator?

Yes! Any generative model can be integrated:

class LLMGenerator(Generator): def __init__(self, config, llm_model): super().__init__(config) self.llm = llm_model def generate(self, context, temperature=None): prompt = f"Generate creative solution for: {context}" return self.llm.generate(prompt, temperature=temperature)
  • Small spaces (< 10⁶ states): Excellent
  • Medium spaces (10⁶–10⁹): Good with efficient generators
  • Very large spaces: Use hierarchies and modularity

The references are from my paper titled RfC (Reinforcement for Creativity): Universal Architecture for Adaptive Creative Agents on OSF: https://osf.io/74dxz/overview

Contributions are welcome! Areas of interest:

  • New application domains
  • Improvements to Coaches
  • More sophisticated Generators
  • Performance optimizations

Please open an issue or submit a pull request.

 MIT

# Run demos python demo_1_math_conjectures.py python demo_2_program_synthesis.py python demo_3_formal_logic.py # Or create your own domain python >>> from rfc_core import RfC, RfCConfig >>> # Your creativity starts here!

RfC: Not just machine learning. Automated creativity.

Made with 🧠 for creative AI

Read Entire Article