RFC: Reinforcement for Creativity

3 weeks ago 2

A universal framework for training creative agents in symbolic domains

RfC (Reinforcement for Creativity) is a learning framework designed to train agents that creatively explore and generate novel solutions rather than simply predict outputs. Unlike traditional neural networks that learn input-output mappings through supervised training, RfC encourages constructive creativity through structured exploration and composite rewards.

Traditional Neural Networks RfC Framework

Learns to predict outputs	Learns to explore creatively
Supervised training	Training with creative incentives
Minimizes prediction error	Maximizes validity and novelty
Memorizes patterns	Discovers new patterns

┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │ Environment │─────▶│ Generator │────▶│ Flexible Coach │ │ (Context) │ │ π_θ(a|s) │ │ (Evaluator) │ └─────────────┘ └──────────────┘ └─────────────────┘ ▲ │ │ ▼ │ ┌───────────┐ │ │ Feedback │ │ │ • Validity│ └─────────────────│ • Novelty │ │ • Reward │ └───────────┘

Environment: Defines the domain and provides context for exploration
Generator: Parameterized agent that generates candidate solutions
Flexible Coach: Evaluator that assesses validity and novelty
RfC Trainer: Orchestrates the creative training loop

For installation, you need to open the .ipynb file in Colab.

Traditional approach:
"Provide conjecture data and train a model to predict new ones."

RfC approach:
"Explore the space of mathematical conjectures creatively; the Coach evaluates validity and novelty to guide the agent toward maximizing both."

1. Mathematical Conjecture Discovery

Generates novel number theory conjectures
Verifies validity using symbolic proofs
Identifies non-trivial mathematical patterns

2. Creative Program Synthesis

Synthesizes efficient algorithms creatively
Explores iterative, recursive, and formula-based approaches
Evaluates correctness and computational efficiency

3. Logical Rule Discovery

Discovers new inference rules in formal logic
Generates valid inference patterns
Verifies logical soundness and identifies novel proof strategies

RfC is fully modular and extensible. Here's how to define your own creative domain:

Step 1: Define the Environment

class MyEnvironment(Environment): def sample_context(self): return {'type': 'my_problem', 'params': {...}} def render_context(self, context): return f"Problem: {context}" def get_action_space(self): return {'type': 'my_actions'}

Step 2: Define the Flexible Coach

class MyCoach(FlexibleCoach): def evaluate_validity(self, candidate, context): validity_score = ... # Your validation logic metadata = {...} return validity_score, metadata def compute_novelty(self, candidate): novelty_score = ... return novelty_score def generate_feedback(self, candidate, context, validity_score, novelty_score): suggestions = [...] return Feedback( validity_score=validity_score, novelty_score=novelty_score, metadata={}, suggestions=suggestions )

Step 3: Define the Generator

class MyGenerator(Generator): def generate(self, context, temperature=None): candidate = ... # Your generative logic return candidate def log_prob(self, candidate, context): return ... def update_parameters(self, gradient): self.parameters += self.config.learning_rate * gradient

config = RfCConfig() env = MyEnvironment() coach = MyCoach() generator = MyGenerator(config) rfc = RfC(generator, coach, env, config) rfc.train(num_episodes=1000) results = rfc.create(n_samples=10)

config = RfCConfig( lambda_v=1.0, # Validity weight lambda_n=0.5, # Novelty weight lambda_c=0.01, # Complexity penalty weight tau_v=0.7, # Validity threshold tau_n=0.6, # Novelty threshold alpha=0.8, # Novelty exponent learning_rate=0.001, # Learning rate max_corpus_size=1000, # Maximum corpus size temperature=1.0 # Sampling temperature )

R(a|s) = λ_v · sigmoid(v - τ_v) + λ_n · η^α - λ_c · C(a)

Where:

v = validity score
η = novelty score
C(a) = complexity penalty

stats = rfc.train( num_episodes=1000, verbose=True, log=True, plot=True ) print(f"Average validity: {stats['avg_validity']}") print(f"Average novelty: {stats['avg_novelty']}") print(f"Valid artifacts: {stats['valid_artifacts']}") print(f"Novel artifacts: {stats['novel_artifacts']}")

RfC excels in domains where:

✅ Validity rules can be clearly defined
✅ Creative exploration is desired
✅ Novelty is as important as correctness
✅ The search space is combinatorial or symbolic

Mathematics: Theorem and conjecture discovery
Programming: Algorithm synthesis, code optimization
Formal Logic: New inference rules, proof tactics
Game Design: Novel mechanics, rule balancing
Molecular Design: Valid structures with desired properties
Music Composition: Novel harmonic progressions
Architecture: Creative structural designs

Is RfC a standard RL algorithm?

No. RfC is specifically designed for constructive creativity with key differences:

Separates domain knowledge from the agent
Uses a deterministic Flexible Coach with structured feedback
Explicitly incentivizes novelty, not only reward
Action space is typically symbolic/combinatorial

Not necessarily:

Small domains: CPU sufficient
Simple generators: MLPs, rules, templates work well on CPU
Large domains: GPU beneficial for Transformers/LLMs

Can I use LLMs as the Generator?

Yes! Any generative model can be integrated:

class LLMGenerator(Generator): def __init__(self, config, llm_model): super().__init__(config) self.llm = llm_model def generate(self, context, temperature=None): prompt = f"Generate creative solution for: {context}" return self.llm.generate(prompt, temperature=temperature)

Small spaces (< 10⁶ states): Excellent
Medium spaces (10⁶–10⁹): Good with efficient generators
Very large spaces: Use hierarchies and modularity

The references are from my paper titled RfC (Reinforcement for Creativity): Universal Architecture for Adaptive Creative Agents on OSF: https://osf.io/74dxz/overview

Contributions are welcome! Areas of interest:

New application domains
Improvements to Coaches
More sophisticated Generators
Performance optimizations

Please open an issue or submit a pull request.

# Run demos python demo_1_math_conjectures.py python demo_2_program_synthesis.py python demo_3_formal_logic.py # Or create your own domain python >>> from rfc_core import RfC, RfCConfig >>> # Your creativity starts here!

RfC: Not just machine learning. Automated creativity.

Made with 🧠 for creative AI

Read Entire Article

RFC: Reinforcement for Creativity

1. Mathematical Conjecture Discovery

2. Creative Program Synthesis

3. Logical Rule Discovery

Step 1: Define the Environment

Step 2: Define the Flexible Coach

Step 3: Define the Generator

Is RfC a standard RL algorithm?

Can I use LLMs as the Generator?

Related

Chinese Takeout Menu [video]

Yuzu Switch Emulator Is Over

Automating Benchmark Design