Show HN: Mixture-of-Models Gateway

3 months ago 1

Mixture-of-Models (MoM) is an advanced pattern that queries multiple large language models simultaneously and synthesizes their responses into a single high-quality output. By leveraging diverse model capabilities and reasoning paths, this approach discovers novel solutions and consistently produces superior results compared to individual models. The technique has demonstrated particular effectiveness in coding tasks, where combined outputs yield improved accuracy, completeness, and creativity by resolving contradictions and filling knowledge gaps across responses.

  • Parallel execution uncovers varied approaches and perspectives across different models
  • Synthesized responses combine strengths of specialized models while mitigating individual weaknesses
  • Collective intelligence identifies errors and contradictions while preserving valuable insights
  • Cross-pollination of concepts from different model architectures reveals innovative solutions
  1. Install dependencies:

    python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt
  2. Configure models in config.yaml with your API keys (you can define environment variables for API keys):

    models: - name: "openrouter" api_key: "${OPENROUTER_API_KEY}"
  3. Start the gateway service:

    python main.py --debug-requests
  4. Connect aider using:

    LM_STUDIO_API_BASE=http://127.0.0.1:8000/v1 \ LM_STUDIO_API_KEY=123 \ aider --model lm_studio/mom --no-stream --editor-model gpt-4.1 --architect

Security Warning: This gateway performs no API key validation. Never expose it beyond localhost as it lacks authentication and security measures suitable for production.

  • OpenAI-compatible /v1/chat/completions endpoint
  • Parallel fan-out to multiple models/configurable via YAML
  • Automatic response synthesis through specialized critic model
  • Environment variable substitution in configuration
  • Debug request tracing (saved to debug-requests/)
  • Customizable model parameters (temperature, max_tokens)
  • Automatic retries with exponential backoff
  • Timeout handling for upstream API calls
  • Streaming responses ("stream": true) are unsupported and return HTTP 400 errors
  • Limited to chat completion endpoints (no embeddings, images, or other modalities)
  • No authentication, rate limiting, or production hardening
  • Static configuration requiring service restart for changes
  • Basic error handling with no advanced fallback mechanisms
  • Critic uses single-stage prompting without multi-step verification
  • Proof-of-concept implementation not suitable for production workloads

This proof-of-concept project encourages exploration of MoM techniques. Any suggestions, bug reports, and pull requests are welcome.

Share findings from experiments with model combinations, critic approaches, or evaluation metrics to advance MoM research.

Read Entire Article